Objective

Microsoft sees all the big companies creating original video content, and they want to get in on the fun. They have decided to create a new movie studio, but the problem is they don't know anything about creating movies. They have hired you to help them better understand the movie industry. Your team is in charge with doing data analysis that explores what type of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the CEO can use when deciding what type of films they should be creating.

Datasets

These are the 5 datasests that were used in this project.

Movies.csv

  • Columns in the dataset
    • name
    • rating
    • genre
    • year
    • released
    • score
    • votes
    • director
    • writer
    • star
    • country
    • budget
    • gross
    • company
    • runtime

Movie_basics.csv

  • Columns in the dataset
    • movie_id
    • primary_title
    • original_title
    • start_year
    • runtime_minutes
    • genres









Tn.movie_budgets.csv.gz

  • Columns in the dataset
    • id
    • release_date
    • movie
    • production_budget
    • domestic_gross
    • worldwide_gross









Movie_ratings.csv

  • Columns in the dataset
    • movie_id
    • averagerating
    • numvotes












Bom.movie_gross.csv.gz

  • Columns in the dataset
    • movie
    • studio
    • domestic_gross
    • foreign_gross
    • year










Approaches

The aim of this project is to advise Microsoft in the production of a profitable, well liked movie or movies in the box office. To do so, this analysis focal point will be about Return On Investment, Net Profit Margin, Profitability, Losses and Ticket Sales. The data will be categorized into different system ratings, which signify the type of audience that will potentially be targeted, this exploration will be analyzed through out the nine insights.These are the subjects of the nine insights in this project; Return on Investment(RIO), Net Profit Margin(NPM), Predictive Analysis, Expenses, Movies that made Profit, Movies that had Losses, Top 20 Highest Profitable Movies VS Top 20 Lowest Profitable Movies, The Most Successful Movie In the Drama Genre and Tickets Sold. This project complies with the objective by using data, the data is then extracted and expressed using data visualization concepts throughout the nine insights. This is done in order to explore data, create structures that allow presentations with useful information that provides context, to also compare each genre and system rating to one another. Based on the dataframe there are nine main genres; Action and Adventures, Drama, Comedy, Documentary, Classics, Art House and International, Musical and Performing Arts, Horror, Mystery and Suspense, Animation, Special Interest, Kids and Family and there are also five main system rating groups; System R Rating, System PG-13 Rating, System PG Rating, System G Rating and System NR Rating. In this Project the four genres that are being used are Adventure, Action, Drama, Comedy, these are also the top 4 most used genres in the film industry . These will all have individual data analysis with the same data visualization concepts, approaches and the same nine insights. The system ratings are being compared among one another throughout the analysis to view which one best fit the genres in each genre. There will be various visualizations for each genre describing and portraying the nine insights based on the activity of the data. However the ultimate aim is not just to satisfy the approaches and insights or in order words be biased but to also analyze the data with objective lenses

Insights

These are the eleven main graphs and concepts to fullfill the objective of this project.
  • 1. Return on Investment or ROI:
    • What is Return on Invest or ROI? ROI is the financial benefit you gain from a business investment. ROI will tell you whether a certain business decision is paying off and if it is not, it can help you make beneficial changes to your business. Return on Investment or ROI is the first approach to analyzing and understanding this data. Return on Investment (ROI) is used to estimate the gain or loss generated on an investment relative to the amount of money invested. In this project ROI is used to understand what movies generate or bring more profit based on their strategies to achieve that. To compare smaller and larger performances to one another, ROI will be used to measure how different system rating and subgenres perform compared to one another or whether investing more into other system rating and sub-genres will be more profitable. ROI focuses on the investment value of a product as in the analysis the ROI is broken down to the dollar, for every dollar spent how much was generated. This is the golden question of this approach. This will help to analyze the sets of movies that generates a lot more than the rest to the heard, this will help understand how to expand and grow by adding more investment and to be confident on the prediction of the results of the added investments based on the blueprints use by the studio with successful movies.

  • 2. Net Profit Margin or NPM:
    • Net Profit Margin or NPM will be used as the second approach to best achieve the objective which is to advise Microsoft in the production of profitable movies in the box office. Net Profit Margin is also used to estimate profitability in the business world just like ROI but from a different angle. NPM is calculated by breaking down the bottom line of an investment. In other words, this formula is a reflection of the total amount of revenue left over after all expenses and additional income is accounted for.NPM focuses more on expenses as ROI focuses more on when, where and what to invest in , to pursue gain , expansion and profit. As NPM is the amount by which revenue from sales exceeds costs. The lower studios keep its cost for the larger the profits on each transaction the higher the margin. IF costs rise but the sales stay constant, the profit margin shrinks. No matter how high your revenue is, if your costs are high all of that generated income goes down the drain. it is not profitable or worth the business , if you spend a billion dollars to make a billion dollars. In this approach NPM will be used to measure how different system rating and subgenres perform compared to one another when it comes to managing expenses, for example if the studios profits are going up and their costs are also going up faster, their profit margin is going to drop. That is a warning that the studios business strategy has a problem. The golden question in this approach is based on the overall revenue of the studios walking away with. Even if the revenue is good overall it may be that the product lines have low margins sign indicating that it should be cut off.

      ROI and NPM overall conculsion: Just because you know the amount of revenue generated every year is not a good enough guide for studios to know how well they are doing. Some movies generate a lot of profit because of the studio size and the abundance of resources, other studios increase in profit but spend too much money to do so. ROI and NPM will give you a better sense of the studios performance through their budget and profit generated.


  • 3. Predictive Analytics:
    • In this approach Linear Regression and Classification Analysis will be used to identify any distinctive patterns or correlation within the data variables and identify any class or groups about the movies in this data analysis. There will be five predictive analyses based on different variables. This predictive analysis will be expressed through four 3D scatter plots and one 4D scatter plot with one Linear Regression Analysis and four Classification Analysis. Before diving into the hypothesis of the five predictive concepts, predictive analytics, linear regression and classification analysis will be briefly explained below.

      What is predictive Analytics? Predictive analysis is using data, statistical algorithms and machine learning techniques to recognize the certainty of future outcomes or circumstances based on historical data. The Objective is to go beyond the knowledge of curtain events to provide the best assessment of what will happen in the future. Companies employ predictive analytics to find patterns in data to identify risks and opportunities. In this approach linear regression and classification analysis will be used to make predictions about future outcomes and performances of significant factors about the movies in this project.

      What is Linear Regression? Linear regression is useful for finding relationships between two significant continuous variables. One is the predictor or independent variable and the other is a response or dependent variable. Linear regression looks for statistical relationships but not deterministic relationships. Relationship between two variables is said to be deterministic if one variable can be accurately expressed by the other statistical relationship is not accurate in determining relationship between two variables. Linear regression uses linear relationships between the dependent and independent variables to predict future outcomes.

      What is Classification Analysis? Classification analysis is a data mining method used to classify unstructured data into structured classes and groups that assist for discovery of hidden information and future planning. Classification analysis can be used to question, make a decision or predict behavior through the use of machine learning. It works by developing a set of training data which contains a set of attributes as well as the likely outcome. The job of the classification algorithm is to discover how that set of attributes reaches its conclusion.

    • The Blueprint of the Five Predictive Visualizations:
    • 1. Hypothesis: If the budget of a movie increase does the
      opening weekend and profit alos increase?

      Method Used: Linear Regression

      Type of Visulization: Animated 3D Scatter Plot

      Variables:
      X: Budget
      Y: Opening Weekend
      Z: Profit




      2. Hypothesis: If the movie was released in any particular season does it affect the opening weekend and profit of that movie?

      Method Used: Classification Analysis

      Type of Visulization: Animated 3D Scatter Plot

      Variables:
      X: Season
      Y: Opening Weekend
      Z: Profit




      3. Hypothesis: Based on the amount of budget used to create the movie, if the movie was released in any particular month will it inccrease the revenue of the movie?

      Method Used: Classification Analysis

      Type of Visulization: Animated 3D Scatter Plot

      Variables:
      X: Budget
      Y: Month Released
      Z: Revenue



      4. Hypothesis: Based on the amoount of budget used to create the movie, if the movie was released in any particular sseason will it increase the profit of the movie?

      Method Used: Classification Analysis

      Type of Visulization: Animated 3D Scatter Plot

      Variables:
      X: Budget
      Y: Season
      Z: Profit



      5. Hypothesis: Based on the amount of budget used to create the movie, if the movie was released in any particular season in any perticular month within that season, will it increase the opening weekend of the movie?

      Method Used: Classification Analysis

      Type of Visulization:Animated 4D Scatter Plot

      Variables:
      X: Budget
      Y: Season
      Z: Month Released
      C: Opening Weekend

      The explanation of all five of the Hypothesises :
      "If the budget of a movie increases, does the opening weekend and profit also increase?"
      The purpose of this hypothesis is to see if the budget of a movie is a significant linear predictor of the opening weekend and the profit of a movie. In other words is it possible that the higher the budget spent to create a movie the higher the opening weekend and movie, as the opening weekend does indicate the success of a movie.

      "If the movie was released in any particular season does it affect the opening weekend and profit of that movie?"
      Classification analysis is used in this approach to identify and assign categories to this data set to allow predictive behaviour. The prediction targeted is using season as a significant predictor of opening weekend and profit. The classification model will be used to create categories that will help predict the opening weekend and profit based on the season the movie was released in.

      "Based on the amount of budget used to create the movie, if the movie was released in any particular month will it increase the revenue of the movie?"
      Classification analysis is used in this approach to identify and assign categories to this data set to allow predictive behaviour.The prediction targeted is using the budget used to produce thee movie as a significant predictor of which month the movie should be released to generate a certain amount of revenue. The classification model will be used to create categories based on the amount of budgeting that was used to get a certain amount of revenue based on the month the movie was released.

      "Based on the amoount of budget used to create the movie, if the movie was released in any particular sseason will it increase the profit of the movie?"
      Classification analysis is used in this approcah to identify and assign categories to this data set to allow predictive behaviour.The prediction targeeted is using the budget as a significant predictor of which month within a particulklar saseon the movie should be released to geenerate a particular openiing weekend. The classificationmodel will be used to create categories based on the amount of budgeting that was used to get a certain amount of opening weeekend based on the month within a particular season that the movie was released in. Then you can use the prediction of the opening weekend and the intial budgeting used to help in this classification analysis, as variables in the linear regression analysis of x any that predict the profitof movies using the budget and opening weekend.

      "Based on the amount of budget used to create the movie, if the movie was released in any particular season will it increase the profit of the movie?"
      Classification analysis is used in this approach to identify and assign categories to this data set to allow predictive behaviour.The prediction targeted is using the budget as a significant predictor of which month within a particular season the movie should be released to generate a particular opening weekend. The classification model will be used to create categories based on the amount of budgeting that was used to get a certain amount of opening weekend based on the month within a particular season that the movie was released in. Then you can use the prediction of the opening weekend and the initial budgeting used to help in this classification analysis, as variables in the linear regression analysis of x any that predict the profit of movies using the budget and opening weekend.


  • 4. Expenses:
    • Hollywood is a big business, making an estimate of 200 billion dollars every year in revenue. However a lot of money is spent to get those results. Movie budgets can average around 100 million dollars for a big budget film, which means a lot of tickets have to be sold to break even. The most expensive movie to produce was 'Pirates of the Caribbean': One Stranger Tides at 422 million dollars. This approach was created to tell the average amount of money that will be expected to invest in a movie. The data is segregated to categories of system ratings which corresponds to the target audience. The profit is another variable that will be the expected result based on the budget spent, that is also segregated to categories of system ratings that project the average profit per category. This approach is a good guide as to what amount of profit will be expected based on the budget that is expected to invest in the movie based on the chosen audience. This will help decide what system ratings is the most expensive in the drama genre to invest in and what audience best fits the desired budget or how much more money is expected to be invested based on the targeted audience, and the expected profit based on the chosen or altered budget or targeted audience path.

  • 5. Movies that made Profit:
    • After the tickets are bought, where and who does the money go to. Does all of the money go straight to the studio or is it shared with the theater.? A portion of theater ticket sales goes to the theater owners, with the studio and distributor who are in charge of marketing the movie get the remaining money. During the opening weekend of a movie it is tradition for the studio of the movie to get a larger portion of the ticket sales. As the weeks go on the theaters get a larger percentage compared to the opening weekend. A studio makes about 60 percent of movie ticket sales in the United States, and around 20 to 40 percent of international ticket sales. The average price of a movie ticket in the United States as of 2022 is 9.16 dollars. Based on the ticket being 9.16 dollars the theater that shows the movie in the states receives 3.66 dollars and the studio receives 5.50 dollars. And the average price of a movie ticket internationally is 10.85 dollars. Based on the tickets outside of the U.S. being 10.85 dollars the studio get 4.34 dollars. The Theater uses the 3.66 dollars to pay for employees, maintenance, food, drinks and other various costs. The studios 9.89 dollars from domestic and international sales id further divided into subcategories of ; Advertising and Marketing: gets 3.97 dollars, Production: gets 3.06 dollars, Movie Distribution: gets 1.79 dollars, Actors and Actresses: gets 1.22 dollars. This was created to see what movies based on a particular system rating or audience that make the most or least profit. This helps decide what audience best fits the drama genre, and which one should be an investment. And this also shows the studios final cut of the ticket sales generated from each system rating.

  • 6. Movies that did not make any money:
    • There is a lot of profit to be made in the Hollywood filming industry, however the economics of a movie production is not as simple as it seems. There is no perfect path for a movie to make profit since factors like brand awareness, P and A budgets and the desires of a fickle public comes into play. Theater attendance in the U.S. has been challenging over recent years, making it even more significant to also spread to appeal to foreign theaters. The mystery of budget arises because it costs far more to make and market a movie than most people expect. For example, the production budget for a summer blockbuster like Maarvels'The Avenger' was estimated to be 200 million dollars. Once you factor in marketing and advertising costs, the budget spikes. This approach was created to analyze the movies that failed to make profit and created debt. This approach will help identify the mistakes that these movies made to help shine light on the mistakes that should be avoided when producing and marketing movies, and also what system rating or targeted audience are prone to mistakes.

  • 7. Top 20 Highest Profitable Movies and Top 20 Lowest Profitable Movies:
    • Top 20 Highest Profitable Movies

      What makes a movie a blockbuster?
      These are some factors to creating blockbuster movies

      1. Size: A blockbuster sometimes creates new market flow worldwide through having a multi-dimensional impact on the industry and audience. Sales are broken and expectations are more then meet with blockbuster movies

      2. Speed: The volume of sales is not just the only characteristics speed of the sales trajectory. The essence of blockbuster is to shatter anything in it's way in such a short period of time. Blockbuster brands address pressing consumer needs so well that often enjoy vertical sales lift off.

      3. Scarcity: In the market, stock outs and shortages normally happens when a blockbuster brand is in high demand. When the new i-Phone was in high demand the speedy availability of counterfeits is another indicator of popularity.

      4. Sustainability: A blockbuster brand is not a one hit wonder. It is a gift that keeps on giving. Just like the seven 'Harry Potter' books and the five companion movies, also with the addition of DVD and merchandises sales and theme parks etc. The 'Harry Potter' economy is valued at 15 billion dollars

      5. Sizzle: A blockbuster movie is not just in high-demand it is magical and addresses an important need in such a exciting and accessible way. Just like the memorable and magical special effects in the 'Star Wars Series'

      Top 20 Lowest Profitable Movies

      How does Hollywood movies make the lowest in ticket sales? These are some factors that can determine the movies
      inability to be successful in ticket sales in the box office.

      1. Budget: It is significant to have the box office to be in proportion to the budget. In order to make profit from a movie it must generally take in a at least 2 dollars at the box office for every dollars spent, to make the movie somewhat profitable. For example 'John Carter (2012)', it had a worldwide gross of 284 million dollars and a domestic box office of 73 million dollars. The cost of the movie was 250 million dollars, the studio made a big loss and the production company made no money.


      2. Timing: Timing is very important when releasing a movie, poor timing of a movie release can cause the right audience from going to the theater due to other things occupying their attention, such as major sports events or fun fairs or carnivals or concerts or bad weather. Not releasing a movie at the wrong time especially releasing a movie when a film is competing with the same genre and similar plot at the same time. Also when releasing the movie too soon after a similar film that has already absorbed the audience interest in film's premise.


      3. Bad Buzz: Bad buzz created around a movie due to people who have already seen the movie, articles and social media news about production problems, bad ratings and reviews from critics or poor word of the mouth. All of this contribute to the inability of the movie to be successful.


      The previous approach was to analyze the movies that did not make any profit and got into debt to help create strategies that will prevent those mistakes. Just because movie production does not just want to loose money release when it comes to profitability. This approach analyzes what movies were the top 20 lowest profitable and what movies were the highest profitable movies, to know what mistakes the top 20 lowest profitable movies did that made them, not as profitable as the rest of theater, and to also investigate what made the 20 highest profitable moves so successful compared to the rest.

  • 8. The Most Successful Moive:
    • Making a movie is not easy especially if the movie has a lot of expensive props, scenes and actors. Just like most other undertakings, producing a movie comes with broken down simple characteristics that can potentially make a movie successful. These are the top simple characteristics of producing a successful movie, script and screenplay, director and cost, differentiation and mass appeal.

      Script and Screenplay: A good story, the script and the plot is one of the great static to creating a successful movie. It is one of the most significant thing about any great film. If the story of the film is really out of the box and something really gripping then the film will be a blockbuster. If it captures attention and interest then most of the viewers will gave it a A rating. After having a good story and plot comes the screenplay. Screenplay implies how the story is being shown to the audience and in what sequence in such a profound way that the audience is going through a spectacular journey or fantastic experience. Setting the correct order of what to show and when is really important. This can also been seen through trailers and advertisement allowing the audience to relate to the characters in the story and start to come about the plot of the movie

      Directors and Cast/Actors: Having a great director can make a movie successful if the director can put thing in motion by making the cast come together to give the movie a quality appearance and feel, that can come together to give the movie a quality appearance and feel, that can contribute to making the movie successful. The cast is also a significant factor if there is chemistry within the cast it can literally make a movie extremally successful, which has been proven historically.

      Differentiation and Mass Appeal: This may be a little for-fetched, it can have it's entertainment quality. If the film produces scenes that the audience have seen before, even it it implies the actors step out of their usually comfort zone when it comes to what characters they normally ply, then the movie may just be interesting enough to be good. Over and above all the other factors, mass appeal may be the number one criterion for the success of a movie. If the studio is able to please a large audience, then the movie may be a successful hit. One of the best ways to capture the most amount of audience, through making the topic a little controversial. The controversy may begin discussions and debates within social media platforms or within friends and family. Getting people to talk may incise the public to flock to the theaters to see what are the commotions about.

    • When it comes down to being successful in the box office, the recipe is pretty simple: small budget + massive ticket sales = huge profit. If it is done correctly this means an enormous ROI and NPM will be waiting for the studios. In this approach ROI is used to distinguish the top 20 profitable movies apart to really see, who among the blockbuster is the most successful. According to The Numbers, the 3 movies have mastered the money making recipe. This became extremally profitable with a strong ROI.
      The Top 3 Highest ROI Movies

      1. Movie: Deep Throat (1972)

      Budget: 25,000

      Return on Investment: 22,528,467

      2. Movie: Facing the Giants (2006)

      Budget: 100,000

      Return on Investment: 38,551,255

      3. Movie: Paranormal Activity (2007)

      Budget: 450,000

      Return on Investment: 89,376,549

      1. Deep Throat (1972) is a 1972 American pornographic film that was at the forefront of the Golden Age of Porn (1969-1984). The film was written and directed by Gerard Damiano. This movie is of the first pornographic films to feature a plot character development and relatively high production values. 'Deep Throat' got so much mainstream attention with the launch of the "prono chic" trend. 'Deep Throat' ended up earning an ROI of 90,014 percent a number that is still on top spot for the past 50 years.

      2. Having movies with a sport theme during the 2000 era often led to a major box office hits. It's a Christian drama with a sports sub-genre that tuned modest budgets to blockbusters in the box office. Facing the Giants (2006) ended up earning with an ROI of 38,451 percent.

      3. Paranormal Activity (2007), this movie was directed and written by Oren Peli, this movie was a classic horror film in 2007. The movie is about a young couple that was haunted by a supernatural entity in their house. The movie ended up with an ROI of 19,761 percent.

    • This is the battle of the blockbuster. The winner of the most successful movie in the drama genre. This approach dives deep to oversee revenue and net profit and use RIO and NPM to get a much vivid understand of which movie was the best among the top 20 highest grossing blockbuster of the drama genre and how they achieve that. Just because a movie has made the most profit doesn't mean its the most profitable than the movie that made less profit. What system rating the most successful movie belong to.

  • 9. Tickets Sold:
    • One of the golden rules in the film industry is to avoid releasing movies during the dump months season, what are dump months? They are deemed the movies out of the year when movie theaters play movies with less critical expectations or buzz surrounding them. Dump months are considered to be late in the summer when everyone is going back to school and work and after the Christmas season has passed. The months of January and February, the month of August and September are the main dump months for movie release dates. Sometimes March can also be considered a dump month. Sometimes dump months don't apply to every movie, a good example of this is "Silence of the Lambs" that was released back in January 1991. This movie won the 'Academy Award' for 'Best Picture'. Dump months can be a great time for undiscovered gems to be released as there is not much of a competition with other movies. This approach also shows within each system rating what season produces the most tickets sold and to compare each season's net profit margin among one another. This is to see the best timing or for each system rating to release their moviesand to see how to mimic the system rating with the highest net profit margin achieved by adjusting the expenses, to get the best of both worlds by having the highest tickets sold and highest net profit margin.

The Audience

Before starting the technical approaches to the insights, we will take a look at the different types of audiences in the movie industry. After all audiencese are the ones that purchase the tickets. The movie industry put audiences in many different categories but they depend on an age-related scheme which abides the film certification categories ('G', 'PG', 'PG-13', 'R', 'NC-17').

G Rated Movies

The target audience: Infants(ages 0-1), Toddlers(1-3), Preschool age (3-4 years), School age (4-5 years)
G-rated movies are family-friendly films. These category of movies are for General Audiences, for all ages are admitted, this means there is nothing in theme, language, nudity, sex, violence, or matters that the ratings board thinks would offend parnets whose younger children view the picture (no mocking relgion, illegal drug use). One of the things that would be fine in a G-rated film today is international romance. G-rated films went from being marks of films for general audeince to being marks of films for children. It represents a ghetlo largely made of nature films and animations. Parents have decided that it is suitable for babies. In 1995, 'Babe' was a big hit and at first glance it would seem to be a picture that is strictly for children a 'G' rated live-action drama-comedy about barned animals that talk. But it also appealed to adults and teens and everyone in-between. It was clever and smart and very funny, rasing the bar quite high for live-action family fare that truly appeals to every member of the family, it drew on an audience that went far beyond the toddler set.

PG Rated Movies

The target audience: Kids who are between the ages of 5-13 years old
The Walt Disney Company and the Classification Board itself have joined children's groups by calling for the "parental guidance" rating to be split into seperate categories for younger viewers and those approaching their teens. The 'PG' label is used for movies that contain mild violence, sex, nudity, drug use and coarse language. "Little Women" got a 'G' rating. Most 'PG' movies are so innocent these days that parents don't research and mistaken some 'PG' movies for little kid movies. This is the rating for most kids movies and select adult movies without gratuitous content, but to teenagers it can seem as childish as the 'G' rating is to pre-teens, thinking they are too cool for it. Brief mild language or the matic elements are nothing to really worry about, but it uses weapons.

PG-13 Rated Movies

The target audience: Teenagers (ages 13-17 years)
There ia a lot of things a 13-year-old can handle that her 10-year-old little brother can't. This means that 'PG-13' movies might be okay for teenaged children to watch but not for little kids. 'PG-13' movies might not be suitable for kids in the tween age group (kids ages 8-12 years old). Studios have always fund themselves under enormious economic pressure to reach as broad an audience as possible. 'PG-13' movies were in throery tame enough for the teenagers to see but edgy enough for adults to queue up as well. It's the adults who really drive success of summer movies like 'The Avengers' and 'Star Trek: into Drakness'. Hollywood greed drives the 'PG-13' monster, the real source of the monster is the adults. The problem is that the young man or woman wants his or her comic books infused with sex and real violence. While a teenager might want the same, but deep down he or she is happier with just the comic book. 'PG-13' movies may seem for only teenagers however parents buy tickets for the kids and aslo themselves, due to the parents sharing the same interest as their kids. Parents accross the country are not deterred by such gnim, pessimistic material in 'PG-13'mocies. In fact they are living up in drores. You would think that having the 'Hunger Games' which consist of kids that are forced to fight each other to the death sound like a great kiddle fare, right?

R Rated Movies

The target audience: People between the ages of 13-35 years old
Target audience are teenagers and adults. 'R' rated movies sometimes raises interest among teenagers beacsue it seems more daring and exciting, hot and sexy, bold and daring. However it is very raunchy that it is harder for many teens to see it. Becuase 'R' rated movies target audience are people between the ages of 13-35 years old it means more kids can't get in without an adult. Interset among kids unde 17-years old are clearly stoked on 'R-rated' movies. There is a huge hot buzz among teens and college students. The issue is how can teens get tickets without their parnets knowing, well parent or gauardian are not only ones who can get tickets, it does not have to be parents who get the tickets for the teens, it could be anyone over 21 years old. If teenagers can get people to buy them beer, I don't think they will have trouble getting 'R-rated' movie tickets.


NC-17 Rated Movies

The target audience: People that are older than 17 years old
'NC-17' stands for "No One Seventeen and Under Admitted". In these types of films, many different scenes are inappropriate for those that are 17 or under, because they might possess gore intimate themes and many other things that could harm audience. 'NC-17' is the highest rating in 'Motion Picture Association' film rating system assigned to films with content the MPA finds to be only suitable for ages 18 and older. The material in the movies is a higher impact than the 'Restricted (R)' rating can accommadiate. Some films that get the 'NC-17' rating are re-edited and resubmitted to try and get the 'R' rating as it permits a larger audience allowing a greater distribution and potential for commercial performance. Additionally 'R-rating' would allow children under '17 years old' to attend. If re-editing the film does not get it the 'R' rating title, in some cases studios will "surrender" the rating, returning the rating leaving the film unrated.

Technical Approach

Before starting any data extracting, liabreis such as Altair, Pandas, Pandas_highcharts, Collections need to be installed to help create great insightful interactive graphs.

This libary 'Pandas' is uesd to read the data from CSV files and compute it into a dataframe.

In [36]:
# importing module
import pandas as pd 

Seaborn is a library based on matplotblib that will used for data visualization in this analysis.

In [37]:
import seaborn as sns

The Collections is a built in Python module, in this analysis it will be used to detect repatiton in a list.

In [38]:
# importing module
from collections import Counter

Matplotlib is a libary that will be used in this analysis for creating visualizations.

In [39]:
# importing module
import matplotlib.pyplot as plt 

Statistics is a module in python that provides calculations of mathematical statistics. These calculations are simple math problems such as mean, median, mode, variance and standard deviation.

In [40]:
# importing module
import statistics 

Plotly.graph_objects is a module that will be used in this analysis for creating graphs and charts.

In [41]:
# importing module
import plotly.graph_objects as go 

This library Numpy is a Python libary that will be used in this analysis to create arrays.

In [42]:
# importing module
import numpy as np

The Math libary is a built-in python module that does mathematical calculations.

In [43]:
# importing module
import math

IPython.display is a module that will be used in this analysis for converting graphs and charts made in python to html for displaying visuals.

In [44]:
# importing module
from IPython.display import display_html 

The Collections is a built in Python module, in this analysis the 'defaultdict' it will be used to detect duplicates in a list.

In [45]:
# importing module
from collections import defaultdict 

The 'scipy.stats import norm' module will be used to visualize the Normal Distribution of the data.

In [46]:
# importing module
from scipy.stats import norm

Dataframe_image is a module that will be used in this analysis to save dataframes as pictures.

In [47]:
# importing module
import dataframe_image as dfi 

The 'scipy.stats import kurtosis' module will be used to get the Kurtosis of the distribution of the data.

In [48]:
# importing module
from scipy.stats import kurtosis 

The 'scipy.stats import skew' module will be used to get the Skewness of the distribution of the data.

In [49]:
# importing module
from scipy.stats import skew 

The 'scipy import stats' module will be used to calulate the Trimmed Mean of the data.

In [50]:
# importing module
from scipy import stats 

The OrderedDict is a data type in the collections module, it tracks the order in which items were added.

In [51]:
# importing module
from collections import Counter, OrderedDict

The Sqlite3 libary will help insert and change rows and manage an SQL database file.

In [52]:
# importing module
import sqlite3
In [53]:
# importing module
from sklearn import linear_model
In [54]:
# importing module
from matplotlib.animation import FuncAnimation
In [55]:
# importing module
from mpl_toolkits.mplot3d import Axes3D
In [56]:
# importing module
import statsmodels.formula.api as smf
In [57]:
# importing module
%matplotlib inline
In [58]:
# importing module
from matplotlib import animation
In [59]:
# importing module
from matplotlib import cm
In [60]:
from sklearn.cluster import KMeans
In [61]:
from sklearn.preprocessing import MinMaxScaler
In [62]:
from sklearn.neighbors import NearestNeighbors
In [63]:
from sklearn import tree
In [64]:
from sklearn.tree import DecisionTreeClassifier

The pandas_highcharts.core libary helps create interactive Highcharts graphs and charts.

In [65]:
# importing module
from pandas_highcharts.core import serialize

The pandas_highcharts.core libary helps create interactive Highcharts graphs and charts.

In [66]:
# importing module
from pandas_highcharts.display import display_charts

The pandas_highcharts.core libary helps create interactive Highcharts graphs and charts.

In [67]:
# importing module
import json

The '%store' helps store dataframes, list and any instance so it dose not have to be complied or created again. It can easily be restored back.

In [370]:
%store Drama_DataFrame
%store df_data
%store system_rating_r
%store system_rating_pg
%store system_rating_pg13
%store system_rating_nc17
%store system_rating_g
%store dataframe_RIO_r
%store dataframe_RIO_pg
%store dataframe_RIO_pg13
%store dataframe_RIO_g
%store dataframe_RIO_NC
%store df_cost_r
%store freq_dis
%store cum_rel_freq
%store freq_cum_dis
%store df_roi_r
%store freq_dis_roi
%store freq_cum_dis1
%store cum_rel_freq1
%store df_roi_per_r
%store freq_dis3
%store freq_cum_dis2
%store cum_rel_freq2
%store df1
%store df2
%store df3
%store df4
%store df5
%store df_opening
%store df_month
%store df_season
%store df_4D
%store df_profit_season
Stored 'Drama_DataFrame' (DataFrame)
Stored 'df_data' (DataFrame)
Stored 'system_rating_r' (DataFrame)
Stored 'system_rating_pg' (DataFrame)
Stored 'system_rating_pg13' (DataFrame)
Stored 'system_rating_nc17' (DataFrame)
Stored 'system_rating_g' (DataFrame)
Stored 'dataframe_RIO_r' (DataFrame)
Stored 'dataframe_RIO_pg' (DataFrame)
Stored 'dataframe_RIO_pg13' (DataFrame)
Stored 'dataframe_RIO_g' (DataFrame)
Stored 'dataframe_RIO_NC' (DataFrame)
Stored 'df_cost_r' (DataFrame)
Stored 'freq_dis' (DataFrame)
Stored 'cum_rel_freq' (DataFrame)
Stored 'freq_cum_dis' (DataFrame)
Stored 'df_roi_r' (DataFrame)
Stored 'freq_dis_roi' (DataFrame)
Stored 'freq_cum_dis1' (DataFrame)
Stored 'cum_rel_freq1' (DataFrame)
Stored 'df_roi_per_r' (DataFrame)
Stored 'freq_dis3' (DataFrame)
Stored 'freq_cum_dis2' (DataFrame)
Stored 'cum_rel_freq2' (DataFrame)
Stored 'df1' (DataFrame)
Stored 'df2' (DataFrame)
Stored 'df3' (DataFrame)
Stored 'df4' (DataFrame)
Stored 'df5' (DataFrame)
Stored 'df_opening' (DataFrame)
Stored 'df_month' (DataFrame)
Stored 'df_season' (DataFrame)
Stored 'df_4D' (DataFrame)
Stored 'df_profit_season' (DataFrame)

The '%store -r' retrives the dataframes or any instance that was stored by the %store method.

In [371]:
%store -r Drama_DataFrame
%store -r df_data
%store -r system_rating_r
%store -r system_rating_pg
%store -r system_rating_pg13
%store -r system_rating_nc17
%store -r system_rating_g
%store -r dataframe_RIO_r
%store -r dataframe_RIO_pg
%store -r dataframe_RIO_pg13
%store -r dataframe_RIO_g
%store -r dataframe_RIO_NC
%store -r df_cost_r
%store -r freq_dis
%store -r cum_rel_freq
%store -r freq_cum_dis
%store -r df_roi_r
%store -r freq_dis_roi
%store -r freq_cum_dis1
%store -r cum_rel_freq1
%store -r df_roi_per_r
%store -r freq_dis3
%store -r freq_cum_dis2
%store -r cum_rel_freq2
%store -r df1
%store -r df2
%store -r df3
%store -r df4
%store -r df5
%store -r df_opening
%store -r df_month
%store -r df_season
%store -r df_4D
%store -r df_profit_season

Before the graphs are made the data frame is extracted from csv files. The first dataframe that will be extracted is called movie_df that is extracted from movies.csv file

In [64]:
movie_df = pd.read_csv("movies.csv")

Checking the dataframe and getting the first five rows of the dataframe

In [84]:
movie_df.head()
Out[84]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 The Shining R Drama 1980 June 13, 1980 (United States) 8.4 927000.0 Stanley Kubrick Stephen King Jack Nicholson United Kingdom 19000000.0 46998772.0 Warner Bros. 146.0
1 The Blue Lagoon R Adventure 1980 July 2, 1980 (United States) 5.8 65000.0 Randal Kleiser Henry De Vere Stacpoole Brooke Shields United States 4500000.0 58853106.0 Columbia Pictures 104.0
2 Star Wars: Episode V - The Empire Strikes Back PG Action 1980 June 20, 1980 (United States) 8.7 1200000.0 Irvin Kershner Leigh Brackett Mark Hamill United States 18000000.0 538375067.0 Lucasfilm 124.0
3 Airplane! PG Comedy 1980 July 2, 1980 (United States) 7.7 221000.0 Jim Abrahams Jim Abrahams Robert Hays United States 3500000.0 83453539.0 Paramount Pictures 88.0
4 Caddyshack R Comedy 1980 July 25, 1980 (United States) 7.3 108000.0 Harold Ramis Brian Doyle-Murray Chevy Chase United States 6000000.0 39846344.0 Orion Pictures 98.0

Using sequel to extract files from the im.db file.

In [65]:
connection = sqlite3.connect("im.db") 

Connecting to the file

In [66]:
cursor = connection.cursor()

Extracting mocie_basics file from the im.db file.

In [67]:
movie_basics_df = pd.read_sql("""SELECT * FROM movie_basics""",connection)

Checking the dataframe and getting the first five rows of the dataframe

In [74]:
movie_basics_df.head()
Out[74]:
movie_id movie original_title start_year runtime_minutes genres
0 tt0063540 Sunghursh Sunghursh 2013 175.0 Action,Crime,Drama
1 tt0066787 One Day Before the Rainy Season Ashad Ka Ek Din 2019 114.0 Biography,Drama
2 tt0069049 The Other Side of the Wind The Other Side of the Wind 2018 122.0 Drama
3 tt0069204 Sabse Bada Sukh Sabse Bada Sukh 2018 NaN Comedy,Drama
4 tt0100275 The Wandering Soap Opera La Telenovela Errante 2017 80.0 Comedy,Drama,Fantasy

Extracting mocie_ratings file from the im.db file.

In [68]:
movie_ratings_df = pd.read_sql("""SELECT * FROM movie_ratings""",connection)

Checking the dataframe and getting the first five rows of the dataframe

In [75]:
movie_ratings_df.head()
Out[75]:
movie_id averagerating numvotes
0 tt10356526 8.3 31
1 tt10384606 8.9 559
2 tt1042974 6.4 20
3 tt1043726 4.2 50352
4 tt1060240 6.5 21

Extracting data from the bom.movie_gross.csv.gz file and making it into a dataframe called movie_gross_df.

In [69]:
movie_gross_df = pd.read_csv("bom.movie_gross (5).csv.gz")

Checking the dataframe and getting the first five rows of the dataframe

In [76]:
movie_gross_df.head()
Out[76]:
movie studio domestic_gross foreign_gross year
0 Toy Story 3 BV 415000000.0 652000000 2010
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010
3 Inception WB 292600000.0 535700000 2010
4 Shrek Forever After P/DW 238700000.0 513900000 2010

Extracting data from the tn.movie_budgets.csv.gz file and making it into a dataframe called movie_budgets_df.

In [70]:
movie_budgets_df = pd.read_csv("tn.movie_budgets.csv.gz")

Checking the dataframe and getting the first five rows of the dataframe

In [77]:
movie_budgets_df.head()
Out[77]:
id release_date movie production_budget domestic_gross worldwide_gross
0 1 Dec 18, 2009 Avatar $425,000,000 $760,507,625 $2,776,345,279
1 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides $410,600,000 $241,063,875 $1,045,663,875
2 3 Jun 7, 2019 Dark Phoenix $350,000,000 $42,762,350 $149,762,350
3 4 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963
4 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi $317,000,000 $620,181,382 $1,316,721,747

Changing the column 'name' to 'movie' to be able to merge movie_df dataframe with others using the same column

In [71]:
movie_df.columns = movie_df.columns.str.replace('name', 'movie')

Checking the dataframe and getting the first five rows of the dataframe

In [78]:
movie_df.head()
Out[78]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 The Shining R Drama 1980 June 13, 1980 (United States) 8.4 927000.0 Stanley Kubrick Stephen King Jack Nicholson United Kingdom 19000000.0 46998772.0 Warner Bros. 146.0
1 The Blue Lagoon R Adventure 1980 July 2, 1980 (United States) 5.8 65000.0 Randal Kleiser Henry De Vere Stacpoole Brooke Shields United States 4500000.0 58853106.0 Columbia Pictures 104.0
2 Star Wars: Episode V - The Empire Strikes Back PG Action 1980 June 20, 1980 (United States) 8.7 1200000.0 Irvin Kershner Leigh Brackett Mark Hamill United States 18000000.0 538375067.0 Lucasfilm 124.0
3 Airplane! PG Comedy 1980 July 2, 1980 (United States) 7.7 221000.0 Jim Abrahams Jim Abrahams Robert Hays United States 3500000.0 83453539.0 Paramount Pictures 88.0
4 Caddyshack R Comedy 1980 July 25, 1980 (United States) 7.3 108000.0 Harold Ramis Brian Doyle-Murray Chevy Chase United States 6000000.0 39846344.0 Orion Pictures 98.0

Changing the column 'primary_title' to 'movie' to be able to merge movie_basics_df dataframe with others using the same column

In [72]:
movie_basics_df.columns = movie_basics_df.columns.str.replace('primary_title', 'movie')

Checking the dataframe and getting the first five rows of the dataframe

In [79]:
movie_basics_df.head()
Out[79]:
movie_id movie original_title start_year runtime_minutes genres
0 tt0063540 Sunghursh Sunghursh 2013 175.0 Action,Crime,Drama
1 tt0066787 One Day Before the Rainy Season Ashad Ka Ek Din 2019 114.0 Biography,Drama
2 tt0069049 The Other Side of the Wind The Other Side of the Wind 2018 122.0 Drama
3 tt0069204 Sabse Bada Sukh Sabse Bada Sukh 2018 NaN Comedy,Drama
4 tt0100275 The Wandering Soap Opera La Telenovela Errante 2017 80.0 Comedy,Drama,Fantasy

Changing the column 'title' to 'movie' to be able to merge movie_gross_df dataframe with others using the same column

In [73]:
movie_gross_df.columns = movie_gross_df.columns.str.replace('title', 'movie')

Checking the dataframe and getting the first five rows of the dataframe

In [80]:
movie_gross_df.head()
Out[80]:
movie studio domestic_gross foreign_gross year
0 Toy Story 3 BV 415000000.0 652000000 2010
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010
3 Inception WB 292600000.0 535700000 2010
4 Shrek Forever After P/DW 238700000.0 513900000 2010

After changing the coulmns that hold the name of the movie in every dataframe, to all have the same name 'movie'. The next code makes sure all the dataframes columns that are named 'movie' have the same movies, to check if the dataframes can be merged using the 'movie' column.

This is the 'common_data' function, this function is created to check if two lists have the same elemnts in common

In [14]:
def common_data(list1, list2):
    result = False
  
    # traverse in the 1st list
    for x in list1:
  
        # traverse in the 2nd list
        for y in list2:
    
            # if one common
            if x == y:
                result = True
                return result 
                  
    return result

Creating a list of movie names from each dataframe to check commonality , to check if the dataframes can be merged using the 'movie' column.

In [105]:
list1 = []
list2 = []
list3 = []
list4 = []
list5 = []
for i in movie_budgets_df.movie:list1.append(i)
for i in movie_gross_df.movie:list2.append(i)
for i in movie_basics_df.movie:list3.append(i)
for i in movie_budgets_df.movie:list4.append(i)
for i in movie_df.movie:list5.append(i)

Checking the number of elements in the 'list1' list.

In [113]:
len(list1)
Out[113]:
5782

Printing the first 20 elements in the 'list1' list.

In [107]:
print(list1[:20])
['Avatar', 'Pirates of the Caribbean: On Stranger Tides', 'Dark Phoenix', 'Avengers: Age of Ultron', 'Star Wars Ep. VIII: The Last Jedi', 'Star Wars Ep. VII: The Force Awakens', 'Avengers: Infinity War', 'Pirates of the Caribbean: At Worldâ\x80\x99s End', 'Justice League', 'Spectre', 'The Dark Knight Rises', 'Solo: A Star Wars Story', 'The Lone Ranger', 'John Carter', 'Tangled', 'Spider-Man 3', 'Captain America: Civil War', 'Batman v Superman: Dawn of Justice', 'The Hobbit: An Unexpected Journey', 'Harry Potter and the Half-Blood Prince']

Checking the number of elements in the 'list2' list.

In [114]:
len(list2)
Out[114]:
3387

Printing the first 20 elements in the 'list2' list.

In [108]:
print(list2[:20])
['Toy Story 3', 'Alice in Wonderland (2010)', 'Harry Potter and the Deathly Hallows Part 1', 'Inception', 'Shrek Forever After', 'The Twilight Saga: Eclipse', 'Iron Man 2', 'Tangled', 'Despicable Me', 'How to Train Your Dragon', 'Clash of the Titans (2010)', 'The Chronicles of Narnia: The Voyage of the Dawn Treader', "The King's Speech", 'Tron Legacy', 'The Karate Kid', 'Prince of Persia: The Sands of Time', 'Black Swan', 'Megamind', 'Robin Hood', 'The Last Airbender']

Checking the number of elements in the 'list3' list.

In [115]:
len(list3)
Out[115]:
146144

Printing the first 20 elements in the 'list3' list.

In [109]:
print(list3[:20])
['Sunghursh', 'One Day Before the Rainy Season', 'The Other Side of the Wind', 'Sabse Bada Sukh', 'The Wandering Soap Opera', 'A Thin Life', 'Bigfoot', 'Joe Finds Grace', 'O Silêncio', 'Nema aviona za Zagreb', 'Pál Adrienn', 'So Much for Justice!', 'Cooper and Hemingway: The True Gen', 'Children of the Green Dragon', 'T.G.M. - osvoboditel', 'The Tragedy of Man', "How Huang Fei-hong Rescued the Orphan from the Tiger's Den", 'Heaven & Hell', 'The Final Journey', 'Los pájaros se van con la muerte']

Checking the number of elements in the 'list5' list.

In [116]:
len(list5)
Out[116]:
7668

Printing the first 20 elements in the 'list5' list.

In [111]:
print(list5[:20])
['The Shining', 'The Blue Lagoon', 'Star Wars: Episode V - The Empire Strikes Back', 'Airplane!', 'Caddyshack', 'Friday the 13th', 'The Blues Brothers', 'Raging Bull', 'Superman II', 'The Long Riders', 'Any Which Way You Can', 'The Gods Must Be Crazy', 'Popeye', 'Ordinary People', 'Dressed to Kill', 'Somewhere in Time', 'Fame', '9 to 5', 'The Fog', 'Stir Crazy']

Checking if the dataframes have some elements in the movie column that are the same.

In [16]:
print(common_data(list5, list3),common_data(list1, list5),common_data(list5, list3),
      common_data(list3, list4))
True True True True

This is the 'commonelem_set' function, this shows how many elements within a list that is in another list.

In [17]:
def commonelem_set(z, x):
    one = set(z)
    two = set(x)
    if (one & two):
        return ("There are common elements in both lists:", one & two)
    else:
        return ("There are no common elements")

Movie_budgets_df have 1238 movies in common with the Movie_gross_df dataframe

In [18]:
len(commonelem_set(list1, list2)[1])
Out[18]:
1238

Movie_budgets_df have 2312 movies in common with the Movie_basics_df dataframe.

In [19]:
len(commonelem_set(list1, list3)[1])
Out[19]:
2312

Movie_budgets_df have 3551 movies in common with the Movie_df dataframe.

In [20]:
len(commonelem_set(list1, list5)[1])
Out[20]:
3551

Movie_gross_df have 2605 movies in common with the Movie_basics_df dataframe.

In [21]:
len(commonelem_set(list2, list3)[1])
Out[21]:
2605

Merging movie_basics_df dataframe with movie_ratings_df dataframe using the movie_id column to create the movie_rating_basics dataframe.

In [117]:
movie_rating_basics = movie_ratings_df.merge(movie_basics_df,on='movie_id')

Checking the dataframe and getting the first five rows of the dataframe

In [118]:
movie_rating_basics.head()
Out[118]:
movie_id averagerating numvotes movie original_title start_year runtime_minutes genres
0 tt10356526 8.3 31 Laiye Je Yaarian Laiye Je Yaarian 2019 117.0 Romance
1 tt10384606 8.9 559 Borderless Borderless 2019 87.0 Documentary
2 tt1042974 6.4 20 Just Inès Just Inès 2010 90.0 Drama
3 tt1043726 4.2 50352 The Legend of Hercules The Legend of Hercules 2014 99.0 Action,Adventure,Fantasy
4 tt1060240 6.5 21 Até Onde? Até Onde? 2011 73.0 Mystery,Thriller

Merging moviebudgets_df dataframe with movie_gross_df dataframe using the movie column to create the df1 dataframe.

In [119]:
df1 = movie_budgets_df.merge(movie_gross_df,on='movie')

Checking the dataframe and getting the first five rows of the dataframe

In [120]:
df1.head()
Out[120]:
id release_date movie production_budget domestic_gross_x worldwide_gross studio domestic_gross_y foreign_gross year
0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides $410,600,000 $241,063,875 $1,045,663,875 BV 241100000.0 804600000 2011
1 4 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963 BV 459000000.0 946400000 2015
2 7 Apr 27, 2018 Avengers: Infinity War $300,000,000 $678,815,482 $2,048,134,200 BV 678800000.0 1,369.5 2018
3 9 Nov 17, 2017 Justice League $300,000,000 $229,024,295 $655,945,209 WB 229000000.0 428900000 2017
4 10 Nov 6, 2015 Spectre $300,000,000 $200,074,175 $879,620,923 Sony 200100000.0 680600000 2015

Merging df1 dataframe with movie_rating_basics dataframe using the movie column to create the df2 dataframe.

In [121]:
df2 = df1.merge(movie_rating_basics,on='movie')

Checking the dataframe and getting the first five rows of the dataframe

In [122]:
df2.head()
Out[122]:
id release_date movie production_budget domestic_gross_x worldwide_gross studio domestic_gross_y foreign_gross year movie_id averagerating numvotes original_title start_year runtime_minutes genres
0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides $410,600,000 $241,063,875 $1,045,663,875 BV 241100000.0 804600000 2011 tt1298650 6.6 447624 Pirates of the Caribbean: On Stranger Tides 2011 136.0 Action,Adventure,Fantasy
1 4 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963 BV 459000000.0 946400000 2015 tt2395427 7.3 665594 Avengers: Age of Ultron 2015 141.0 Action,Adventure,Sci-Fi
2 7 Apr 27, 2018 Avengers: Infinity War $300,000,000 $678,815,482 $2,048,134,200 BV 678800000.0 1,369.5 2018 tt4154756 8.5 670926 Avengers: Infinity War 2018 149.0 Action,Adventure,Sci-Fi
3 9 Nov 17, 2017 Justice League $300,000,000 $229,024,295 $655,945,209 WB 229000000.0 428900000 2017 tt0974015 6.5 329135 Justice League 2017 120.0 Action,Adventure,Fantasy
4 10 Nov 6, 2015 Spectre $300,000,000 $200,074,175 $879,620,923 Sony 200100000.0 680600000 2015 tt2379713 6.8 352504 Spectre 2015 148.0 Action,Adventure,Thriller

Merging df1 dataframe with movie_df dataframe using the movie column to create the df3 dataframe. This is the last merge.

In [185]:
df3 = df2.merge(movie_df,on='movie')

Checking the dataframe and getting the first five rows of the dataframe

In [186]:
df3.head()
Out[186]:
id release_date movie production_budget domestic_gross_x worldwide_gross studio domestic_gross_y foreign_gross year_x ... score votes director writer star country budget gross company runtime
0 4 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963 BV 459000000.0 946400000 2015 ... 7.3 777000.0 Joss Whedon Joss Whedon Robert Downey Jr. United States 250000000.0 1.402810e+09 Marvel Studios 141.0
1 7 Apr 27, 2018 Avengers: Infinity War $300,000,000 $678,815,482 $2,048,134,200 BV 678800000.0 1,369.5 2018 ... 8.4 897000.0 Anthony Russo Christopher Markus Robert Downey Jr. United States 321000000.0 2.048360e+09 Marvel Studios 149.0
2 9 Nov 17, 2017 Justice League $300,000,000 $229,024,295 $655,945,209 WB 229000000.0 428900000 2017 ... 6.1 418000.0 Zack Snyder Jerry Siegel Ben Affleck United States 300000000.0 6.579270e+08 Warner Bros. 120.0
3 10 Nov 6, 2015 Spectre $300,000,000 $200,074,175 $879,620,923 Sony 200100000.0 680600000 2015 ... 6.8 393000.0 Sam Mendes John Logan Daniel Craig United Kingdom 245000000.0 8.806815e+08 B24 148.0
4 11 Jul 20, 2012 The Dark Knight Rises $275,000,000 $448,139,099 $1,084,439,099 WB 448100000.0 636800000 2012 ... 8.4 1600000.0 Christopher Nolan Jonathan Nolan Christian Bale United Kingdom 250000000.0 1.081143e+09 Warner Bros. 164.0

5 rows × 31 columns

Information of all the dataframes merged to create df3.

In [26]:
df3.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1262 entries, 0 to 1261
Data columns (total 31 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 1262 non-null   int64  
 1   release_date       1262 non-null   object 
 2   movie              1262 non-null   object 
 3   production_budget  1262 non-null   object 
 4   domestic_gross_x   1262 non-null   object 
 5   worldwide_gross    1262 non-null   object 
 6   studio             1262 non-null   object 
 7   domestic_gross_y   1262 non-null   float64
 8   foreign_gross      1128 non-null   object 
 9   year_x             1262 non-null   int64  
 10  movie_id           1262 non-null   object 
 11  averagerating      1262 non-null   float64
 12  numvotes           1262 non-null   int64  
 13  original_title     1262 non-null   object 
 14  start_year         1262 non-null   int64  
 15  runtime_minutes    1237 non-null   float64
 16  genres             1254 non-null   object 
 17  rating             1262 non-null   object 
 18  genre              1262 non-null   object 
 19  year_y             1262 non-null   int64  
 20  released           1261 non-null   object 
 21  score              1262 non-null   float64
 22  votes              1262 non-null   float64
 23  director           1262 non-null   object 
 24  writer             1262 non-null   object 
 25  star               1262 non-null   object 
 26  country            1261 non-null   object 
 27  budget             1183 non-null   float64
 28  gross              1261 non-null   float64
 29  company            1261 non-null   object 
 30  runtime            1261 non-null   float64
dtypes: float64(8), int64(5), object(18)
memory usage: 315.5+ KB

Dropping unwanted coulmns from dataframe df3.

In [187]:
df3 = df3.drop(['id',  'movie_id', 'numvotes', 'original_title', 'start_year', 'genres', 
          'year_y','released','score','votes','country','gross','runtime_minutes',
          'budget','domestic_gross_y'], axis=1)

Dropping unwanted coulmns from dataframe df3.

In [188]:
df3= df3.drop(['year_x'], axis=1)

Checking that the columns were removed.

In [127]:
df3.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1262 entries, 0 to 1261
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   release_date       1262 non-null   object 
 1   movie              1262 non-null   object 
 2   production_budget  1262 non-null   object 
 3   domestic_gross_x   1262 non-null   object 
 4   worldwide_gross    1262 non-null   object 
 5   studio             1262 non-null   object 
 6   foreign_gross      1128 non-null   object 
 7   averagerating      1262 non-null   float64
 8   rating             1262 non-null   object 
 9   genre              1262 non-null   object 
 10  director           1262 non-null   object 
 11  writer             1262 non-null   object 
 12  star               1262 non-null   object 
 13  company            1261 non-null   object 
 14  runtime            1261 non-null   float64
dtypes: float64(2), object(13)
memory usage: 157.8+ KB

After merging all the csv files and taking our some columnc then we are going to modify the columns to create the final dataframe that will be usedin this analysis. The coulmns that will be in the Drama Dataframe, which will be the finished dataframe that is the final result of the editing of all the other dataframes.

  • Columns that will be in the Drama Dataframe:
    • Movie: This is the name of the movies that will be analyzed.
    • Release_Date: Release date of the movie (YYYY-MM-DD)
    • Genre: The main genre of the movie
    • Rating: System rating of the movie (R, PG, etc.)
    • Production_Budget: The budget of a movie in interger
    • Production_Budget_x: The budget of a movie in currency
    • Domestic_Gross: The domestic revenue genrested by the movie in interger
    • Domestic_Gross_x: The domestic revenue genreated by the movie in currency
    • Foreign_Gross: The International revenue genreated by the movie in integer
    • Foreign_Gross_x: The International revenue genreated by the movie in currency
    • Worldwide_Gross: This the the enitre revenue in both domestic aand internatinal sales of the mmovie generated in integer

  • Worldwide_Gross_x: This the the enitre revenue in both domestic aand internatinal sales of the mmovie generated in currency
  • Profit: The profit of the movie by subtracting the budget of the movie by the worldwide revenue of the movie, which is in integer
  • Profit_x: The profit of the movie by subtracting the budget of the movie by the worldwide revenue of the movie, which is in string
  • Tickets: The number of tickets sold in integer
  • Tickets_x: The number of tickets sold in string
  • Runtime: Duration of the movie
  • Averagerating: IMDb user rating of the movie
  • Company: The production company of the movie
  • Star: Main actor/actress of the movie
  • Director: The director of the movie
  • Writer: Writer of the movie

Creating the Production_Budget column by turning the production budget coulm from the df3 dataframe and turning it from currency into integer

In [128]:
storage1 = []
storage2 = []
production_budget_x=[]
for i in  df3.production_budget:
    storage1.append(i.replace('$',''))
for i in storage1:
    storage2.append(i.replace(',',''))
for i in storage2:   
    i = int(i)
    production_budget_x.append(i)

The 'storage1' list that was tranfromed from string to integer.

In [135]:
print(storage1[:40])
['330,600,000', '300,000,000', '300,000,000', '300,000,000', '275,000,000', '275,000,000', '275,000,000', '275,000,000', '260,000,000', '250,000,000', '250,000,000', '250,000,000', '250,000,000', '250,000,000', '250,000,000', '230,000,000', '225,000,000', '220,000,000', '220,000,000', '217,000,000', '215,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '200,000,000', '200,000,000']

The 'production_budget_x' list , this is the result of the tranfromation of the string being changed to integer.

In [132]:
print(production_budget_x[:40])
[330600000, 300000000, 300000000, 300000000, 275000000, 275000000, 275000000, 275000000, 260000000, 250000000, 250000000, 250000000, 250000000, 250000000, 250000000, 230000000, 225000000, 220000000, 220000000, 217000000, 215000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 99000000, 99000000, 99000000, 99000000, 99000000, 99000000, 99000000, 99000000, 200000000, 200000000]

Checking the number of elements in the 'production_budget_x' list.

In [30]:
len(production_budget_x)
Out[30]:
1262

Creating the Domestic_Gross column by turning the domestic gross column from df3 dataframe into integer

In [136]:
storage1 = []
storage2 = []
domestic_gross_y = []
for i in  df3.domestic_gross_x:
    storage1.append(i.replace('$',''))
for i in storage1:
    storage2.append(i.replace(',',''))
for i in storage2:   
    i = int(i)
    domestic_gross_y.append(i)

The 'storage1' list that was tranfromed from string to integer.

In [137]:
print(storage1[:40])
['459,005,868', '678,815,482', '229,024,295', '200,074,175', '448,139,099', '213,767,512', '89,302,115', '73,058,679', '200,821,936', '408,084,349', '330,360,194', '303,003,568', '258,366,855', '255,119,788', '225,764,765', '172,558,876', '291,045,518', '262,030,663', '65,233,400', '130,168,683', '652,270,625', '245,439,076', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '700,059,566', '608,581,744']

The 'domestic_gross_y' list , this is the result of the tranfromation of the string being changed to integer.

In [138]:
print(domestic_gross_y[:40])
[459005868, 678815482, 229024295, 200074175, 448139099, 213767512, 89302115, 73058679, 200821936, 408084349, 330360194, 303003568, 258366855, 255119788, 225764765, 172558876, 291045518, 262030663, 65233400, 130168683, 652270625, 245439076, 105487148, 105487148, 105487148, 105487148, 105487148, 105487148, 105487148, 105487148, 30824628, 30824628, 30824628, 30824628, 30824628, 30824628, 30824628, 30824628, 700059566, 608581744]

Checking the number of elements in the 'domestic_gross_y' list.

In [31]:
len(domestic_gross_y)
Out[31]:
1262

Creating the Worldwide_Gross column by turning the worldwide gross column from df3 dataframe into integer

In [139]:
storage1 = []
storage2 = []
worldwide_gross_x=[]
for i in  df3.worldwide_gross:
    storage1.append(i.replace('$',''))
for i in storage1:
    storage2.append(i.replace(',',''))
for i in storage2:   
    i = int(i)
    worldwide_gross_x.append(i)

The 'storage1' list that was tranfromed from string to integer.

In [141]:
print(storage1[:40])
['1,403,013,963', '2,048,134,200', '655,945,209', '879,620,923', '1,084,439,099', '393,151,347', '260,002,115', '282,778,100', '586,477,240', '1,140,069,413', '867,500,281', '1,017,003,568', '960,366,855', '945,577,621', '1,234,846,267', '788,241,137', '667,999,518', '757,890,267', '313,477,717', '602,893,340', '1,648,854,864', '1,104,039,076', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '1,348,258,224', '1,242,520,711']

The 'worldwide_gross_x' list , this is the result of the tranfromation of the string being changed to integer.

In [140]:
print(worldwide_gross_x[:40])
[1403013963, 2048134200, 655945209, 879620923, 1084439099, 393151347, 260002115, 282778100, 586477240, 1140069413, 867500281, 1017003568, 960366855, 945577621, 1234846267, 788241137, 667999518, 757890267, 313477717, 602893340, 1648854864, 1104039076, 322459006, 322459006, 322459006, 322459006, 322459006, 322459006, 322459006, 322459006, 84747441, 84747441, 84747441, 84747441, 84747441, 84747441, 84747441, 84747441, 1348258224, 1242520711]

Checking the number of elements in the 'worldwide_gross_x' list.

In [32]:
len(worldwide_gross_x)
Out[32]:
1262

Creating a function that checks if an element is 'NaN'.

In [164]:
def isNaN(num):
    return num != num
isNaN(8)# testing the function 
Out[164]:
False

Creating the Foreign_Gross_x column by turning the foreign gross column from the df3 dataframe from integer to currency

In [146]:
demo = []
foreign_gross_x = []
for i in df3.foreign_gross:
    if isinstance(i, str):
        if isNaN(i) == True:demo.append(i)
        if isNaN(i) == False:
            i = i.replace(",","")
            i = int(float(i))
            demo.append(i)
    else:demo.append(i)
for i in demo:
        if math.isnan(i) == True:foreign_gross_x.append(i)
        if math.isnan(i) == False:
            foreign_gross_x.append("${:,.0f}".format(i))

The 'demo' list that was tranfromed from integer to currency.

In [152]:
print(demo[:40])
[946400000, 1369, 428900000, 680600000, 636800000, 179200000, 171200000, 211100000, 391000000, 745200000, 543300000, 718100000, 700000000, 700900000, 1010, 622300000, 377000000, 495900000, 237600000, 475300000, 1019, 858600000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 646900000, 634200000]

The 'foreign_gross_x' list , this is the result of the tranfromation of the integer being changed to currency.

In [153]:
print(foreign_gross_x[:40])
['$946,400,000', '$1,369', '$428,900,000', '$680,600,000', '$636,800,000', '$179,200,000', '$171,200,000', '$211,100,000', '$391,000,000', '$745,200,000', '$543,300,000', '$718,100,000', '$700,000,000', '$700,900,000', '$1,010', '$622,300,000', '$377,000,000', '$495,900,000', '$237,600,000', '$475,300,000', '$1,019', '$858,600,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$646,900,000', '$634,200,000']

Checking the number of elements in the 'foreign_gross_x' list.

In [147]:
len(foreign_gross_x)
Out[147]:
1262

Creating the Foreign_Gross column by turning the foreign gross column from df3 dataframe into integer

In [155]:
foreign_gross = []
for i in df3.foreign_gross:
    if isinstance(i, str):
        i = i.replace(",","")
        i = int(float(i))
        foreign_gross.append(i)
    else:foreign_gross.append(i)

The first 40 elemnts of the 'foreign_gross' list.

In [157]:
print(foreign_gross[:40])
[946400000, 1369, 428900000, 680600000, 636800000, 179200000, 171200000, 211100000, 391000000, 745200000, 543300000, 718100000, 700000000, 700900000, 1010, 622300000, 377000000, 495900000, 237600000, 475300000, 1019, 858600000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 646900000, 634200000]

Checking the number of elements in the 'foreign_gross' list.

In [35]:
len(foreign_gross)
Out[35]:
1262

After creating the columns, they are then aded to dataframe df3

In [189]:
df3['foreign_gross']=foreign_gross
df3['worldwide_gross_x']=worldwide_gross_x
df3['foreign_gross_x']=foreign_gross_x
df3['production_budget_x']=production_budget_x
df3['domestic_gross_y']=domestic_gross_y

Creating the Profit column by subtracting the world gross column from the production budget coulmn from df3 dataframe to get the profit of the movies which are in integer

In [162]:
profit = []
for x,y in enumerate(df3.worldwide_gross_x):
    profit.append(y-df3.production_budget_x[x])
print(profit[:40]) #showing the profit list 
[1072413963, 1748134200, 355945209, 579620923, 809439099, 118151347, -14997885, 7778100, 326477240, 890069413, 617500281, 767003568, 710366855, 695577621, 984846267, 558241137, 442999518, 537890267, 93477717, 385893340, 1433854864, 894039076, 112459006, 112459006, 112459006, 112459006, 112459006, 112459006, 112459006, 112459006, -14252559, -14252559, -14252559, -14252559, -14252559, -14252559, -14252559, -14252559, 1148258224, 1042520711]

Checking the number of elements in the 'profit' list.

In [37]:
len(profit)
Out[37]:
1262

Creating the Profit_x column by turning the elements in the profit list to a currency

In [163]:
profit_x = []
for i in profit:
    profit_x.append("${:,.0f}".format(i))
print(profit_x[:40]) #showing the profit_x list 
['$1,072,413,963', '$1,748,134,200', '$355,945,209', '$579,620,923', '$809,439,099', '$118,151,347', '$-14,997,885', '$7,778,100', '$326,477,240', '$890,069,413', '$617,500,281', '$767,003,568', '$710,366,855', '$695,577,621', '$984,846,267', '$558,241,137', '$442,999,518', '$537,890,267', '$93,477,717', '$385,893,340', '$1,433,854,864', '$894,039,076', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$1,148,258,224', '$1,042,520,711']

Checking the number of elements in the 'profit_x' list.

In [38]:
len(profit_x)
Out[38]:
1262

Creating the Tickets column by deviding the worldwide gross from the worldwide gross column in df3 with '10' which is the average ticket price worlwide, to get the number of tickets that were sold from each movie.

In [166]:
no_tickets = []
for i in df3.worldwide_gross_x:
    no_tickets.append(round(i/10))
print(no_tickets[:40]) #showing the no_tickets list 
[140301396, 204813420, 65594521, 87962092, 108443910, 39315135, 26000212, 28277810, 58647724, 114006941, 86750028, 101700357, 96036686, 94557762, 123484627, 78824114, 66799952, 75789027, 31347772, 60289334, 164885486, 110403908, 32245901, 32245901, 32245901, 32245901, 32245901, 32245901, 32245901, 32245901, 8474744, 8474744, 8474744, 8474744, 8474744, 8474744, 8474744, 8474744, 134825822, 124252071]

Checking the number of elements in the 'no_tickets' list.

In [39]:
len(no_tickets)
Out[39]:
1262

Creating the Tickets_x column by turning the elements in the no_tickets list to a string

In [167]:
str_tickets = [] 
for i in no_tickets:
    str_tickets.append("{:,.0f}".format(i))
print(str_tickets[:40]) #showing the str_tickets list 
['140,301,396', '204,813,420', '65,594,521', '87,962,092', '108,443,910', '39,315,135', '26,000,212', '28,277,810', '58,647,724', '114,006,941', '86,750,028', '101,700,357', '96,036,686', '94,557,762', '123,484,627', '78,824,114', '66,799,952', '75,789,027', '31,347,772', '60,289,334', '164,885,486', '110,403,908', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '134,825,822', '124,252,071']

Checking the number of elements in the 'str_tickets' list.

In [40]:
len(str_tickets)
Out[40]:
1262

After creating more columns, then they are added to dataframe df3.

In [190]:
df3['Profit']=profit
df3['Profit_x']=profit_x
df3['Tickets']=no_tickets
df3['Tickets_x']=str_tickets

Checking the dataframe and getting the first five rows of the dataframe

In [169]:
df3.head()
Out[169]:
release_date movie production_budget domestic_gross_x worldwide_gross studio foreign_gross averagerating rating genre ... company runtime worldwide_gross_x foreign_gross_x production_budget_x domestic_gross_y Profit Profit_x Tickets Tickets_x
0 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963 BV 946400000.0 7.3 PG-13 Action ... Marvel Studios 141.0 1403013963 $946,400,000 330600000 459005868 1072413963 $1,072,413,963 140301396 140,301,396
1 Apr 27, 2018 Avengers: Infinity War $300,000,000 $678,815,482 $2,048,134,200 BV 1369.0 8.5 PG-13 Action ... Marvel Studios 149.0 2048134200 $1,369 300000000 678815482 1748134200 $1,748,134,200 204813420 204,813,420
2 Nov 17, 2017 Justice League $300,000,000 $229,024,295 $655,945,209 WB 428900000.0 6.5 PG-13 Action ... Warner Bros. 120.0 655945209 $428,900,000 300000000 229024295 355945209 $355,945,209 65594521 65,594,521
3 Nov 6, 2015 Spectre $300,000,000 $200,074,175 $879,620,923 Sony 680600000.0 6.8 PG-13 Action ... B24 148.0 879620923 $680,600,000 300000000 200074175 579620923 $579,620,923 87962092 87,962,092
4 Jul 20, 2012 The Dark Knight Rises $275,000,000 $448,139,099 $1,084,439,099 WB 636800000.0 8.4 PG-13 Action ... Warner Bros. 164.0 1084439099 $636,800,000 275000000 448139099 809439099 $809,439,099 108443910 108,443,910

5 rows × 23 columns

Making sure the dataframes coullmns are aligned.

In [43]:
df3.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1262 entries, 0 to 1261
Data columns (total 23 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   release_date         1262 non-null   object 
 1   movie                1262 non-null   object 
 2   production_budget    1262 non-null   object 
 3   domestic_gross_x     1262 non-null   object 
 4   worldwide_gross      1262 non-null   object 
 5   studio               1262 non-null   object 
 6   foreign_gross        1128 non-null   float64
 7   averagerating        1262 non-null   float64
 8   rating               1262 non-null   object 
 9   genre                1262 non-null   object 
 10  director             1262 non-null   object 
 11  writer               1262 non-null   object 
 12  star                 1262 non-null   object 
 13  company              1261 non-null   object 
 14  runtime              1261 non-null   float64
 15  worldwide_gross_x    1262 non-null   int64  
 16  foreign_gross_x      1128 non-null   object 
 17  production_budget_x  1262 non-null   int64  
 18  domestic_gross_y     1262 non-null   int64  
 19  Profit               1262 non-null   int64  
 20  Profit_x             1262 non-null   object 
 21  Tickets              1262 non-null   int64  
 22  Tickets_x            1262 non-null   object 
dtypes: float64(3), int64(5), object(15)
memory usage: 268.9+ KB

Rearranging the columns in dataframe 'df3'.

In [191]:
df3 = df3[['movie','release_date','genre','rating','production_budget_x','production_budget',
           'domestic_gross_y','domestic_gross_x','foreign_gross','foreign_gross_x','worldwide_gross',
           'worldwide_gross_x','Profit','Profit_x','Tickets','Tickets_x','runtime','averagerating',
           'company','studio','star','director','writer']]

Checking the dataframe and getting the first five rows of the dataframe

In [192]:
df3.head()
Out[192]:
movie release_date genre rating production_budget_x production_budget domestic_gross_y domestic_gross_x foreign_gross foreign_gross_x ... Profit_x Tickets Tickets_x runtime averagerating company studio star director writer
0 Avengers: Age of Ultron May 1, 2015 Action PG-13 330600000 $330,600,000 459005868 $459,005,868 946400000.0 $946,400,000 ... $1,072,413,963 140301396 140,301,396 141.0 7.3 Marvel Studios BV Robert Downey Jr. Joss Whedon Joss Whedon
1 Avengers: Infinity War Apr 27, 2018 Action PG-13 300000000 $300,000,000 678815482 $678,815,482 1369.0 $1,369 ... $1,748,134,200 204813420 204,813,420 149.0 8.5 Marvel Studios BV Robert Downey Jr. Anthony Russo Christopher Markus
2 Justice League Nov 17, 2017 Action PG-13 300000000 $300,000,000 229024295 $229,024,295 428900000.0 $428,900,000 ... $355,945,209 65594521 65,594,521 120.0 6.5 Warner Bros. WB Ben Affleck Zack Snyder Jerry Siegel
3 Spectre Nov 6, 2015 Action PG-13 300000000 $300,000,000 200074175 $200,074,175 680600000.0 $680,600,000 ... $579,620,923 87962092 87,962,092 148.0 6.8 B24 Sony Daniel Craig Sam Mendes John Logan
4 The Dark Knight Rises Jul 20, 2012 Action PG-13 275000000 $275,000,000 448139099 $448,139,099 636800000.0 $636,800,000 ... $809,439,099 108443910 108,443,910 164.0 8.4 Warner Bros. WB Christian Bale Christopher Nolan Jonathan Nolan

5 rows × 23 columns

Renaming the columns in dataframe 'df3'.

In [193]:
df3.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
           'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
            'Company','Studio','Star','Director','Writer']

Checking the dataframe and getting the first five rows of the dataframe

In [194]:
df3.head()
Out[194]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit_x Tickets Tickets_x Runtime Averagerating Company Studio Star Director Writer
0 Avengers: Age of Ultron May 1, 2015 Action PG-13 330600000 $330,600,000 459005868 $459,005,868 946400000.0 $946,400,000 ... $1,072,413,963 140301396 140,301,396 141.0 7.3 Marvel Studios BV Robert Downey Jr. Joss Whedon Joss Whedon
1 Avengers: Infinity War Apr 27, 2018 Action PG-13 300000000 $300,000,000 678815482 $678,815,482 1369.0 $1,369 ... $1,748,134,200 204813420 204,813,420 149.0 8.5 Marvel Studios BV Robert Downey Jr. Anthony Russo Christopher Markus
2 Justice League Nov 17, 2017 Action PG-13 300000000 $300,000,000 229024295 $229,024,295 428900000.0 $428,900,000 ... $355,945,209 65594521 65,594,521 120.0 6.5 Warner Bros. WB Ben Affleck Zack Snyder Jerry Siegel
3 Spectre Nov 6, 2015 Action PG-13 300000000 $300,000,000 200074175 $200,074,175 680600000.0 $680,600,000 ... $579,620,923 87962092 87,962,092 148.0 6.8 B24 Sony Daniel Craig Sam Mendes John Logan
4 The Dark Knight Rises Jul 20, 2012 Action PG-13 275000000 $275,000,000 448139099 $448,139,099 636800000.0 $636,800,000 ... $809,439,099 108443910 108,443,910 164.0 8.4 Warner Bros. WB Christian Bale Christopher Nolan Jonathan Nolan

5 rows × 23 columns

The movies form the df3 dataframe have been put into genres groups.

In [195]:
# putting movies into groups
grouped = []
for i in df3.Genre:
    grouped.append(i)
grouped=Counter(grouped)
grouped
Out[195]:
Counter({'Action': 404,
         'Animation': 92,
         'Adventure': 61,
         'Drama': 200,
         'Comedy': 243,
         'Biography': 118,
         'Horror': 63,
         'Crime': 66,
         'Mystery': 2,
         'Romance': 1,
         'Fantasy': 8,
         'Sci-Fi': 3,
         'Thriller': 1})

The movies form the df3 data frame have been put into system-rating groups.

In [196]:
# putting movies into groups
grouped1 = []
for i in df3.Rating:
    grouped1.append(i)
grouped1= Counter(grouped1)
grouped1
Out[196]:
Counter({'PG-13': 534,
         'PG': 161,
         'G': 9,
         'R': 544,
         'Not Rated': 11,
         'Unrated': 1,
         'NC-17': 2})

Getting the index of all the drama genre movies in the df3 dataframe

In [197]:
drama_index = []
for i,x in enumerate(df3.Genre):
    if x == 'Drama':drama_index.append(i)
print(drama_index) #showing the drama_index list 
[65, 66, 128, 188, 203, 273, 308, 322, 328, 347, 349, 365, 368, 373, 374, 376, 380, 391, 398, 407, 412, 417, 418, 419, 421, 422, 428, 447, 463, 475, 508, 511, 512, 514, 520, 526, 527, 528, 530, 531, 532, 534, 537, 540, 542, 575, 582, 590, 593, 605, 608, 610, 629, 646, 659, 675, 681, 686, 690, 694, 696, 715, 716, 741, 746, 751, 755, 756, 767, 768, 772, 773, 776, 781, 783, 797, 798, 808, 809, 810, 820, 821, 834, 850, 851, 852, 853, 854, 857, 867, 868, 869, 873, 878, 888, 902, 906, 907, 908, 909, 910, 911, 917, 918, 925, 929, 934, 936, 937, 939, 966, 970, 971, 972, 973, 975, 978, 979, 980, 983, 992, 995, 1005, 1006, 1030, 1031, 1032, 1037, 1038, 1040, 1041, 1050, 1053, 1070, 1072, 1073, 1074, 1079, 1081, 1083, 1084, 1087, 1105, 1106, 1107, 1108, 1121, 1123, 1125, 1130, 1132, 1136, 1138, 1139, 1140, 1142, 1143, 1145, 1146, 1148, 1149, 1151, 1152, 1154, 1155, 1157, 1158, 1162, 1166, 1173, 1177, 1178, 1182, 1187, 1198, 1205, 1207, 1209, 1210, 1211, 1213, 1215, 1216, 1217, 1219, 1220, 1229, 1232, 1233, 1239, 1243, 1244, 1245, 1246, 1254, 1255, 1256, 1257, 1258, 1261]

Checking the number of elements in the 'drama_index' list.

In [48]:
len(drama_index)
Out[48]:
200

Pulling the columns using the index that belongs to the Drama genre from df3 dataframe. This is used to create demo_df dataframe.

In [198]:
demo_df = df3.iloc[drama_index]

Resetting the index of demo_df dataframe.

In [199]:
demo_df = demo_df.reset_index(drop=True)

The new dataframe demo_df.

In [200]:
demo_df
Out[200]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit_x Tickets Tickets_x Runtime Averagerating Company Studio Star Director Writer
0 Hugo Nov 23, 2011 Drama PG 180000000 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 ... $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Par. Asa Butterfield Martin Scorsese John Logan
1 Hugo Nov 23, 2011 Drama PG 180000000 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 ... $47,784 18004778 18,004,778 126.0 7.9 Paramount Pictures Par. Asa Butterfield Martin Scorsese John Logan
2 The Wolfman Feb 12, 2010 Drama R 150000000 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 ... $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Uni. Benicio Del Toro Joe Johnston Andrew Kevin Walker
3 Gravity Oct 4, 2013 Drama PG-13 110000000 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 ... $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. WB Sandra Bullock Alfonso Cuarón Alfonso Cuarón
4 Django Unchained Dec 25, 2012 Drama R 100000000 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 ... $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Wein. Jamie Foxx Quentin Tarantino Quentin Tarantino
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
195 Like Crazy Oct 28, 2011 Drama PG-13 250000 $250,000 3395391 $3,395,391 336000.0 $336,000 ... $3,478,400 372840 372,840 86.0 7.2 Paramount Vantage ParV Felicity Jones Drake Doremus Drake Doremus
196 The Canyons Aug 2, 2013 Drama R 250000 $250,000 59671 $59,671 NaN NaN ... $-187,625 6238 6,238 99.0 3.8 Prettybird IFC Lindsay Lohan Paul Schrader Bret Easton Ellis
197 Another Earth Jul 22, 2011 Drama PG-13 175000 $175,000 1321194 $1,321,194 456000.0 $456,000 ... $1,927,779 210278 210,278 92.0 7.0 Artists Public Domain FoxS Brit Marling Mike Cahill Mike Cahill
198 Sound of My Voice Apr 27, 2012 Drama R 135000 $135,000 408015 $408,015 NaN NaN ... $294,448 42945 42,945 85.0 6.6 Skyscraper Films FoxS Christopher Denham Zal Batmanglij Zal Batmanglij
199 A Ghost Story Jul 7, 2017 Drama R 100000 $100,000 1594798 $1,594,798 NaN NaN ... $2,669,782 276978 276,978 92.0 6.8 Sailor Bear A24 Casey Affleck David Lowery David Lowery

200 rows × 23 columns

Checking if the dataframes has duplicte rows and deleting the rows thta are duplicated.

Getting all the names of the movies from the demo_df dataframe to detect duplication.

In [201]:
demo_name = []
for i,x in enumerate(demo_df.Movie):demo_name.append(x)
print(demo_name) #showing the demo_name list 
['Hugo', 'Hugo', 'The Wolfman', 'Gravity', 'Django Unchained', 'Sing', 'Downsizing', 'Gone Girl', 'Contagion', 'Trouble with the Curve', 'Priest', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Burlesque', 'Burlesque', 'Crimson Peak', 'Zero Dark Thirty', 'Creed II', 'The Post', 'Hereafter', 'Dream House', 'Upside Down', 'Upside Down', 'Upside Down', 'Anna Karenina', 'Anna Karenina', 'Arrival', 'Charlie St. Cloud', 'Fifty Shades of Grey', 'Bridge of Spies', 'The Impossible', 'Paranoia', 'Paranoia', 'Victor Frankenstein', 'Water for Elephants', 'The Master', 'The Master', 'The Master', 'Creed', 'Creed', 'Creed', 'Dolphin Tale', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Tree of Life', 'Biutiful', 'The Longest Ride', 'Step Up Revolution', 'Flight', 'Extraordinary Measures', 'The Vow', 'The Age of Adaline', 'The Space Between Us', 'Safe Haven', 'Anonymous', 'The Best of Me', 'The Help', 'Dear John', 'The Lucky One', 'The Giver', 'Draft Day', 'Rings', 'Tulip Fever', 'Fences', 'The Ides of March', 'Nocturnal Animals', 'The Water Diviner', 'Stone', 'Stone', 'For Colored Girls', 'The Beaver', 'Wonder', 'The Last Song', 'Me Before You', 'The Debt', 'The Debt', 'The Light Between Oceans', 'Let Me In', 'Let Me In', 'By the Sea', 'By the Sea', 'The Book Thief', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'A Quiet Place', 'A Quiet Place', 'Beastly', 'The Roommate', 'Remember Me', 'Remember Me', 'The Homesman', 'The Immigrant', 'The Woman in Black', 'Country Strong', 'One Day', 'One Day', 'One Day', 'One Day', 'One Day', 'One Day', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Suffragette', 'Black Swan', 'Ex Machina', 'The Perks of Being a Wallflower', 'Room', 'Chloe', 'Project Almanac', 'If Beale Street Could Talk', 'Wish Upon', 'Arbitrage', 'Stoker', 'Carol', 'If I Stay', 'Brooklyn', 'Brooklyn', 'Quartet', 'Hereditary', 'Everything, Everything', 'Mud', 'Mud', 'Coriolanus', 'Coriolanus', 'Amour', 'Melancholia', 'Melancholia', 'Ouija: Origin of Evil', 'Black or White', 'Manchester by the Sea', 'Yeh Jawaani Hai Deewani', 'The Bye Bye Man', 'Gifted', 'Gifted', 'Gifted', 'We Need to Talk About Kevin', 'Hesher', 'Shame', 'Shame', 'The Words', 'Lights Out', 'Lights Out', 'Lights Out', 'Lights Out', 'Still Alice', 'Addicted', 'Before I Fall', 'Everything Must Go', 'Rabbit Hole', 'Mommy', 'Take Shelter', 'Maggie', 'Maggie', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Boyhood', 'Stake Land', 'The Witch', 'Margin Call', 'Whiplash', 'War Room', 'Before Midnight', 'Ida', 'Courageous', 'Silent House', "Winter's Bone", 'The Florida Project', 'We Are Your Friends', 'Locke', 'The Babadook', 'Knock Knock', 'Knock Knock', 'Buried', 'Buried', 'The Lunchbox', 'Unsane', 'Mustang', 'Blue Valentine', 'Martha Marcy May Marlene', 'Palo Alto', 'I Origins', 'The Invitation', 'Like Crazy', 'Like Crazy', 'The Canyons', 'Another Earth', 'Sound of My Voice', 'A Ghost Story']

The function 'list_duplicates' that finds duplicated elements and puts them in a list.

In [202]:
def list_duplicates(seq):
    tally = defaultdict(list)
    for i,item in enumerate(seq):
        tally[item].append(i)
    return ((key,locs) for key,locs in tally.items() 
                            if len(locs)>1)

TUsing the 'list_duplicates' to get all the duplications of the names of the movies in the 'demo_name' list.

In [203]:
demo_dup = []
for dup in sorted(list_duplicates(demo_name)):
    demo_dup.append(dup)

Showing all the duplicated elemets within the demo_df Drama dataframe

In [204]:
demo_dup
Out[204]:
[('A Quiet Place', [86, 87]),
 ('Anna', [155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166]),
 ('Anna Karenina', [24, 25]),
 ('Brooklyn', [117, 118]),
 ('Buried', [184, 185]),
 ('Burlesque', [13, 14]),
 ('By the Sea', [80, 81]),
 ('Coriolanus', [124, 125]),
 ('Creed', [38, 39, 40]),
 ('Gifted', [134, 135, 136]),
 ('Hugo', [0, 1]),
 ('Knock Knock', [182, 183]),
 ('Let Me In', [78, 79]),
 ('Lights Out', [142, 143, 144, 145]),
 ('Like Crazy', [194, 195]),
 ('Maggie', [153, 154]),
 ('Melancholia', [127, 128]),
 ('Mud', [122, 123]),
 ('One Day', [96, 97, 98, 99, 100, 101]),
 ('Paranoia', [31, 32]),
 ('Remember Me', [90, 91]),
 ('Shame', [139, 140]),
 ('Stone', [68, 69]),
 ('The Debt', [75, 76]),
 ('The Master', [35, 36, 37]),
 ('Upside Down', [21, 22, 23])]

Getting the index of the duplicated elements in the demo_df Drama dataframe

In [55]:
for i in demo_dup:print(i[1][1:])
[87]
[156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166]
[25]
[118]
[185]
[14]
[81]
[125]
[39, 40]
[135, 136]
[1]
[183]
[79]
[143, 144, 145]
[195]
[154]
[128]
[123]
[97, 98, 99, 100, 101]
[32]
[91]
[140]
[69]
[76]
[36, 37]
[22, 23]

Putting the duplicated elements index in a list to drop the duplicated elements later on.

In [206]:
demo_dup_index = []
for i in demo_dup:demo_dup_index+=i[1][1:]
print(demo_dup_index) #showing the demo_dup_index list 
[87, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 25, 118, 185, 14, 81, 125, 39, 40, 135, 136, 1, 183, 79, 143, 144, 145, 195, 154, 128, 123, 97, 98, 99, 100, 101, 32, 91, 140, 69, 76, 36, 37, 22, 23]

Checking the number of elements in the 'demo_dup_index' list.

In [56]:
len(demo_dup_index)
Out[56]:
46

Dropping all the duplicated elements in the demo_df Drama dataframe

In [207]:
demo_df = demo_df.drop(demo_dup_index)
Drama_df = demo_df.reset_index(drop=True)#reseting the index

Putting the movies ratings into groups

In [209]:
grouped1 = []
for i in Drama_df.Rating:
    grouped1.append(i)
grouped1= Counter(grouped1)
grouped1
Out[209]:
Counter({'PG': 7, 'R': 67, 'PG-13': 76, 'Not Rated': 3, 'NC-17': 1})

The distribution of movies between the system rating is very uneven. 'PG': 7, 'R': 67, 'PG-13': 76, 'NC-17': 1. PG-13 has the highest number of movies which is 76, the objective is to make the distribution as even as possible. To achieve that movies will be taken from the movie_df dataframe to be added to the Drama_df dataframe to the rest of the system rating 'PG', 'R' and 'NC-17' to get 76 movies.

In [210]:
Drama_df
Out[210]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit_x Tickets Tickets_x Runtime Averagerating Company Studio Star Director Writer
0 Hugo Nov 23, 2011 Drama PG 180000000 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 ... $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Par. Asa Butterfield Martin Scorsese John Logan
1 The Wolfman Feb 12, 2010 Drama R 150000000 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 ... $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Uni. Benicio Del Toro Joe Johnston Andrew Kevin Walker
2 Gravity Oct 4, 2013 Drama PG-13 110000000 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 ... $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. WB Sandra Bullock Alfonso Cuarón Alfonso Cuarón
3 Django Unchained Dec 25, 2012 Drama R 100000000 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 ... $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Wein. Jamie Foxx Quentin Tarantino Quentin Tarantino
4 Sing Dec 21, 2016 Drama PG-13 75000000 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 ... $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Uni. Lorraine Bracco Richard Baskin Dean Pitchford
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
149 Like Crazy Oct 28, 2011 Drama PG-13 250000 $250,000 3395391 $3,395,391 336000.0 $336,000 ... $3,478,400 372840 372,840 86.0 6.7 Paramount Vantage ParV Felicity Jones Drake Doremus Drake Doremus
150 The Canyons Aug 2, 2013 Drama R 250000 $250,000 59671 $59,671 NaN NaN ... $-187,625 6238 6,238 99.0 3.8 Prettybird IFC Lindsay Lohan Paul Schrader Bret Easton Ellis
151 Another Earth Jul 22, 2011 Drama PG-13 175000 $175,000 1321194 $1,321,194 456000.0 $456,000 ... $1,927,779 210278 210,278 92.0 7.0 Artists Public Domain FoxS Brit Marling Mike Cahill Mike Cahill
152 Sound of My Voice Apr 27, 2012 Drama R 135000 $135,000 408015 $408,015 NaN NaN ... $294,448 42945 42,945 85.0 6.6 Skyscraper Films FoxS Christopher Denham Zal Batmanglij Zal Batmanglij
153 A Ghost Story Jul 7, 2017 Drama R 100000 $100,000 1594798 $1,594,798 NaN NaN ... $2,669,782 276978 276,978 92.0 6.8 Sailor Bear A24 Casey Affleck David Lowery David Lowery

154 rows × 23 columns

Before adding movies from the movie_df dataframes, all its drama movies system ratings will be put into groups, to see if there is enough movies to add to add to the Drama_df dataframe

In [60]:
grouped1 = []
for i,x in enumerate(movie_df.rating):
    if movie_df.genre[i] == 'Drama':grouped1.append(movie_df.rating[i])
grouped1=collections.Counter(grouped1)
grouped1
Out[60]:
Counter({'R': 767,
         'PG': 155,
         nan: 35,
         'G': 11,
         'Not Rated': 113,
         'PG-13': 390,
         'Unrated': 27,
         'NC-17': 14,
         'X': 1,
         'TV-PG': 2,
         'TV-MA': 3})

After putting the movies into groups based on the syatem rating. The ratings thta didnt have enoough movies were 'PG', 'R' and 'NC-17'. 'R' rated needs 9 mmovies, 'PG' rated needs 59 movies, now 'NC-17' rated needs 75 movies however there is ony 14 movies that are 'NC-17' rated and 'G' rated needs 76 movies as it didnt have any movies, however there is only 11 movies that are 'G' rated. This is the movies for 'PG','R','G' and 'NC-17' needs. Movies needed in each system rating: 'PG': 59 movies needed,'R': 9 movies needed,'G': 11 movies needed and 'NC-17': 14 movies needed.

Getting the index of 'PG' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.

In [211]:
index_pg = []
for i,x in enumerate(movie_df.genre):
    if x =='Drama' and movie_df.rating[i] == 'PG' and movie_df.country[i] == 'United States':
        index_pg.append(i)
print(index_pg) #showing the index_pg list 
[15, 24, 33, 38, 61, 63, 64, 114, 116, 119, 135, 150, 170, 225, 265, 297, 312, 339, 373, 382, 411, 458, 461, 473, 488, 503, 527, 560, 561, 583, 606, 621, 630, 663, 776, 793, 805, 815, 841, 897, 908, 1026, 1037, 1217, 1230, 1235, 1241, 1287, 1311, 1377, 1387, 1399, 1447, 1492, 1499, 1572, 1706, 1756, 1779, 1811, 1821, 1840, 1953, 2025, 2036, 2070, 2081, 2104, 2126, 2278, 2280, 2286, 2461, 2539, 2664, 2710, 2730, 2813, 2829, 2938, 2993, 3029, 3056, 3186, 3371, 3609, 3963, 4078, 4145, 4834, 4950, 4971, 4979, 4981, 5075, 5139, 5406, 5765, 5813, 5828, 6600, 6662, 6840, 7065, 7447, 7577]

Checking the number of elements in the 'index_pg' list.

In [61]:
len(index_pg)
Out[61]:
106

Getting the index of 'R' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.

In [212]:
index_r = []
for i,x in enumerate(movie_df.genre):
    if x =='Drama' and movie_df.rating[i] == 'R'and movie_df.country[i] == 'United States':
        index_r.append(i)
print(index_r) #showing the index_r list 
[13, 16, 81, 107, 143, 146, 160, 174, 180, 181, 188, 193, 211, 236, 248, 257, 273, 285, 304, 343, 348, 368, 408, 409, 415, 427, 443, 450, 454, 525, 533, 544, 553, 570, 585, 608, 610, 616, 620, 627, 631, 654, 656, 676, 697, 698, 741, 757, 784, 795, 802, 852, 853, 856, 861, 868, 870, 876, 911, 924, 951, 961, 962, 992, 1006, 1035, 1040, 1057, 1143, 1166, 1181, 1188, 1196, 1197, 1207, 1208, 1226, 1236, 1247, 1274, 1283, 1336, 1337, 1339, 1343, 1358, 1365, 1391, 1405, 1408, 1416, 1418, 1423, 1428, 1442, 1457, 1483, 1518, 1564, 1570, 1574, 1575, 1594, 1608, 1635, 1652, 1660, 1665, 1685, 1687, 1723, 1735, 1736, 1738, 1750, 1752, 1760, 1761, 1787, 1819, 1831, 1832, 1834, 1857, 1867, 1872, 1876, 1903, 1916, 1925, 1930, 1944, 1964, 1965, 1975, 1979, 1998, 1999, 2031, 2033, 2034, 2044, 2049, 2050, 2053, 2065, 2080, 2085, 2138, 2155, 2159, 2182, 2189, 2191, 2217, 2218, 2227, 2231, 2261, 2276, 2293, 2299, 2315, 2323, 2332, 2337, 2338, 2346, 2361, 2368, 2369, 2383, 2413, 2425, 2432, 2439, 2443, 2454, 2457, 2476, 2478, 2501, 2502, 2514, 2536, 2552, 2606, 2660, 2662, 2703, 2747, 2758, 2771, 2775, 2782, 2783, 2793, 2810, 2860, 2863, 2865, 2869, 2889, 2936, 2942, 2966, 3003, 3007, 3010, 3011, 3036, 3039, 3043, 3044, 3049, 3121, 3127, 3141, 3150, 3164, 3165, 3166, 3176, 3183, 3188, 3197, 3201, 3236, 3239, 3244, 3245, 3264, 3291, 3302, 3313, 3320, 3366, 3377, 3391, 3398, 3401, 3413, 3450, 3458, 3460, 3461, 3497, 3498, 3505, 3552, 3563, 3567, 3572, 3581, 3583, 3621, 3634, 3700, 3740, 3744, 3762, 3776, 3789, 3793, 3795, 3796, 3832, 3833, 3850, 3855, 3903, 3925, 3941, 3943, 3961, 3974, 3987, 3994, 4024, 4032, 4039, 4045, 4074, 4105, 4109, 4118, 4126, 4148, 4167, 4197, 4241, 4262, 4272, 4333, 4357, 4361, 4411, 4438, 4449, 4450, 4468, 4480, 4489, 4494, 4505, 4547, 4568, 4599, 4621, 4649, 4672, 4762, 4763, 4796, 4797, 4810, 4818, 4822, 4831, 4856, 4909, 4914, 4920, 4961, 5010, 5043, 5070, 5106, 5163, 5167, 5187, 5189, 5198, 5217, 5218, 5223, 5225, 5234, 5242, 5258, 5272, 5275, 5276, 5317, 5362, 5385, 5386, 5401, 5416, 5489, 5508, 5511, 5523, 5533, 5538, 5559, 5574, 5620, 5640, 5650, 5676, 5691, 5711, 5798, 5799, 5805, 5820, 5835, 5836, 5840, 5842, 5846, 5915, 5916, 5950, 5984, 5990, 5997, 6044, 6057, 6065, 6068, 6134, 6147, 6170, 6186, 6192, 6212, 6213, 6231, 6236, 6258, 6323, 6335, 6362, 6402, 6405, 6408, 6412, 6426, 6447, 6450, 6489, 6512, 6540, 6553, 6575, 6587, 6610, 6635, 6638, 6649, 6669, 6710, 6735, 6868, 6891, 6893, 6897, 6927, 6985, 7005, 7039, 7067, 7085, 7092, 7098, 7102, 7117, 7132, 7142, 7147, 7156, 7172, 7173, 7180, 7193, 7196, 7217, 7234, 7248, 7267, 7279, 7331, 7378, 7408, 7418, 7422, 7443, 7461, 7462, 7495, 7507, 7513, 7530, 7550, 7592, 7593, 7658, 7661]

Checking the number of elements in the 'index_r' list.

In [62]:
len(index_r)
Out[62]:
460

Getting the index of 'G' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.

In [213]:
index_g = []
for i,x in enumerate(movie_df.genre):
    if x =='Drama' and movie_df.rating[i] == 'G':index_g.append(i)
print(index_g) #showing the index_g list 
[321, 629, 1124, 1218, 1622, 1901, 2283, 2580, 2706, 3624, 4146]

Checking the number of elements in the 'index_g' list.

In [63]:
len(index_g)
Out[63]:
11

Getting the index of 'NC-17' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.

In [214]:
index_nc = []
for i,x in enumerate(movie_df.genre):
    if x =='Drama' and movie_df.rating[i] == 'NC-17':index_nc.append(i)
print(index_nc) #showing the index_nc list 
[926, 1946, 2170, 2393, 2653, 2661, 2856, 3175, 4257, 4609, 5112, 5872, 6029, 6256]

Checking the number of elements in the 'index_nc' list.

In [64]:
len(index_nc)
Out[64]:
14

This is to help show what movies were already added to the Drama_df dataframe so duplicates are not created

In [65]:
for i,x in enumerate(Drama_df.Movie):
    if Drama_df.Rating[i]=='PG':print(x)
Hugo
Dolphin Tale
Extraordinary Measures
Wonder
The Last Song
War Room
The Lunchbox

Turning ths index of the movies in the 'PG' rating into a dataframe called demo_pg.

In [215]:
demo_pg = movie_df.iloc[index_pg]
demo_pg = demo_pg.reset_index(drop=True)

Checking the dataframe and getting the first five rows of the dataframe

In [216]:
demo_pg.head()
Out[216]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Somewhere in Time PG Drama 1980 October 3, 1980 (United States) 7.2 27000.0 Jeannot Szwarc Richard Matheson Christopher Reeve United States 5100000.0 9709597.0 Rastar Pictures 103.0
1 Urban Cowboy PG Drama 1980 June 6, 1980 (United States) 6.4 14000.0 James Bridges Aaron Latham John Travolta United States NaN 46918287.0 Paramount Pictures 132.0
2 Cattle Annie and Little Britches PG Drama 1980 April 24, 1981 (United States) 6.1 604.0 Lamont Johnson David Eyre Scott Glenn United States 5100000.0 534816.0 Cattle Annie Productions 97.0
3 The Jazz Singer PG Drama 1980 December 19, 1980 (United States) 5.9 4000.0 Richard Fleischer Samson Raphaelson Laurence Olivier United States NaN 27118000.0 EMI Films 115.0
4 The Competition PG Drama 1980 December 3, 1980 (United States) 6.7 1900.0 Joel Oliansky Joel Oliansky Richard Dreyfuss United States NaN 14287755.0 Rastar Films 123.0

Turning ths index of the movies in the 'NC-17' rating into a dataframe called demo_nc.

In [217]:
demo_nc = movie_df.iloc[index_nc]
demo_nc = demo_nc.reset_index(drop=True)

Checking the demo_nc dataframe.

In [218]:
demo_nc
Out[218]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Matador NC-17 Drama 1986 March 7, 1986 (Spain) 7.0 11000.0 Pedro Almodóvar Pedro Almodóvar Assumpta Serna Spain NaN 286126.0 Compañía Iberoamericana de TV 110.0
1 Whore NC-17 Drama 1991 October 18, 1991 (United States) 5.6 3500.0 Ken Russell David Hines Theresa Russell United States NaN 1008404.0 Cheap Date 85.0
2 Tokyo Decadence NC-17 Drama 1992 April 30, 1993 (United States) 6.0 3000.0 Ryû Murakami Ryû Murakami Miho Nikaido Japan NaN 277845.0 Cinemabrain 112.0
3 Wide Sargasso Sea NC-17 Drama 1993 April 16, 1993 (United States) 5.7 1900.0 John Duigan Jan Sharp Karina Lombard Australia NaN 1614784.0 Laughing Kookaburra Productions 98.0
4 Kids NC-17 Drama 1995 September 1, 1995 (United States) 7.1 75000.0 Larry Clark Harmony Korine Leo Fitzpatrick United States 1500000.0 7412216.0 Guys Upstairs 91.0
5 Showgirls NC-17 Drama 1995 September 22, 1995 (United States) 4.9 64000.0 Paul Verhoeven Joe Eszterhas Elizabeth Berkley France 45000000.0 20358624.0 Carolco Pictures 128.0
6 Crash NC-17 Drama 1996 March 21, 1997 (United States) 6.4 54000.0 David Cronenberg J.G. Ballard James Spader Canada 9000000.0 2671291.0 Alliance Communications Corporation 100.0
7 Bent NC-17 Drama 1997 November 26, 1997 (United States) 7.2 7900.0 Sean Mathias Martin Sherman Lothaire Bluteau United Kingdom NaN 496059.0 Channel Four Films 105.0
8 The Dreamers NC-17 Drama 2003 February 20, 2004 (United States) 7.2 114000.0 Bernardo Bertolucci Gilbert Adair Michael Pitt United Kingdom 15000000.0 24152155.0 Recorded Picture Company (RPC) 115.0
9 Ma mère NC-17 Drama 2004 May 19, 2004 (France) 5.1 6600.0 Christophe Honoré Georges Bataille Isabelle Huppert France NaN 1510052.0 Gemini Films 110.0
10 Lust, Caution NC-17 Drama 2007 October 26, 2007 (United States) 7.5 38000.0 Ang Lee Eileen Chang Tony Chiu-Wai Leung Taiwan 15000000.0 67091915.0 Haishang Films 157.0
11 Shame NC-17 Drama 2011 January 13, 2012 (United Kingdom) 7.2 187000.0 Steve McQueen Steve McQueen Michael Fassbender United Kingdom 6500000.0 19123767.0 Fox Searchlight Pictures 101.0
12 Elles NC-17 Drama 2011 February 1, 2012 (France) 5.6 6700.0 Malgorzata Szumowska Tine Byrckel Juliette Binoche France NaN 3822241.0 Slot Machine 99.0
13 Blue Is the Warmest Colour NC-17 Drama 2013 October 9, 2013 (Belgium) 7.7 142000.0 Abdellatif Kechiche Abdellatif Kechiche Léa Seydoux France NaN 19465835.0 Quat'sous Films 180.0

Turning ths index of the movies in the 'R' rating into a dataframe called demo_r.

In [219]:
demo_r = movie_df.iloc[index_r]
demo_r = demo_r.reset_index(drop=True)
demo_r = demo_r[:11]

Checking the demo_r dataframe

In [220]:
demo_r
Out[220]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Ordinary People R Drama 1980 September 19, 1980 (United States) 7.7 49000.0 Robert Redford Judith Guest Donald Sutherland United States 6000000.0 54766923.0 Paramount Pictures 124.0
1 Fame R Drama 1980 May 16, 1980 (United States) 6.6 21000.0 Alan Parker Christopher Gore Eddie Barth United States NaN 21202829.0 Metro-Goldwyn-Mayer (MGM) 134.0
2 Windows R Drama 1980 January 18, 1980 (United States) 4.8 643.0 Gordon Willis Barry Siegel Talia Shire United States NaN 2128395.0 Mike Lobell Productions 96.0
3 Endless Love R Drama 1981 July 17, 1981 (United States) 4.9 7600.0 Franco Zeffirelli Scott Spencer Brooke Shields United States NaN 32492674.0 PolyGram Filmed Entertainment 116.0
4 Ghost Story R Drama 1981 December 18, 1981 (United States) 6.3 7900.0 John Irvin Peter Straub Craig Wasson United States NaN 23371905.0 Universal Pictures 110.0
5 One from the Heart R Drama 1981 February 11, 1982 (United States) 6.5 5700.0 Francis Ford Coppola Armyan Bernstein Frederic Forrest United States 26000000.0 636796.0 Zoetrope Studios 107.0
6 The Hand R Drama 1981 April 24, 1981 (United States) 5.5 5700.0 Oliver Stone Marc Brandel Michael Caine United States NaN 2447576.0 Orion Pictures 104.0
7 Pennies from Heaven R Drama 1981 January 1, 1982 (United States) 6.5 5300.0 Herbert Ross Dennis Potter Steve Martin United States 22000000.0 9171289.0 Metro-Goldwyn-Mayer (MGM) 108.0
8 Zoot Suit R Drama 1981 January 1, 1982 (United States) 6.8 1100.0 Luis Valdez Luis Valdez Daniel Valdez United States 2700000.0 3256082.0 Universal Pictures 103.0
9 Rich and Famous R Drama 1981 October 9, 1981 (United States) 5.9 1600.0 George Cukor Gerald Ayres Jacqueline Bisset United States NaN 14492125.0 Jaquet 117.0
10 Raggedy Man R Drama 1981 September 18, 1981 (United States) 6.8 1400.0 Jack Fisk William D. Wittliff Sissy Spacek United States NaN 1976198.0 Universal Pictures 94.0

Turning ths index of the movies in the 'G' rating into a dataframe called demo_g.

In [221]:
demo_g = movie_df.iloc[index_g]
demo_g = demo_g.reset_index(drop=True)

Checking the demo_g dataframe

In [222]:
demo_g
Out[222]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 La traviata G Drama 1982 February 18, 1983 (Italy) 7.2 1300.0 Franco Zeffirelli Francesco Maria Piave Teresa Stratas Netherlands NaN 3783329.0 Accent Films B.V. 109.0
1 A Sunday in the Country G Drama 1984 April 11, 1984 (France) 7.6 2500.0 Bertrand Tavernier Pierre Bost Louis Ducreux France NaN 2411143.0 Films A2 90.0
2 Babette's Feast G Drama 1987 March 4, 1988 (United States) 7.8 19000.0 Gabriel Axel Karen Blixen Stéphane Audran Denmark NaN 4637920.0 Panorama Film A/S 103.0
3 Little Dorrit G Drama 1987 October 21, 1988 (United States) 7.3 1000.0 Christine Edzard Charles Dickens Derek Jacobi United Kingdom NaN 1025228.0 Sands 357.0
4 Prancer G Drama 1989 November 17, 1989 (United States) 6.4 4800.0 John D. Hancock Greg Taylor Sam Elliott United States NaN 18587135.0 Cineplex Odeon Films 103.0
5 Wild Hearts Can't Be Broken G Drama 1991 May 24, 1991 (United States) 7.2 5000.0 Steve Miner Matt Williams Gabrielle Anwar United States NaN 7294835.0 Walt Disney Pictures 88.0
6 The Secret Garden G Drama 1993 August 13, 1993 (United States) 7.3 38000.0 Agnieszka Holland Frances Hodgson Burnett Kate Maberly United Kingdom 18000000.0 31181347.0 Warner Bros. 101.0
7 Through the Olive Trees G Drama 1994 January 25, 1995 (France) 7.8 7100.0 Abbas Kiarostami Abbas Kiarostami Mohamad Ali Keshavarz Iran NaN NaN Abbas Kiarostami Productions 103.0
8 A Little Princess G Drama 1995 May 19, 1995 (United States) 7.7 33000.0 Alfonso Cuarón Frances Hodgson Burnett Liesel Matthews United States 17000000.0 10015449.0 Warner Bros. 97.0
9 The Winslow Boy G Drama 1999 October 29, 1999 (United Kingdom) 7.3 7500.0 David Mamet Terence Rattigan Rebecca Pidgeon United Kingdom NaN 3957934.0 Winslow Partners Ltd. 104.0
10 The Rookie G Drama 2002 March 29, 2002 (United States) 6.9 33000.0 John Lee Hancock Mike Rich Dennis Quaid United States 22000000.0 80693537.0 98 MPH Productions 127.0

The demo_pg dataframe has '106' rows based on those rows, '59' has to be chosen to fit the criteria for the Drama_df dataframe. These are the chosen index for the '59' rows.

In [2]:
demo_pg_index = [0,1,101,102,103,104,105,3,2,86,87,88,89,90,91,92,93,94,95,
                96,97,98,100,75,76,77,78,79,82,83,85,74,73,71,70,69,68,67,66,65,
                5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,24,25]
print(demo_pg_index) #showing the demo_pg_index list 
[0, 1, 101, 102, 103, 104, 105, 3, 2, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 100, 75, 76, 77, 78, 79, 82, 83, 85, 74, 73, 71, 70, 69, 68, 67, 66, 65, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25]

Checking the number of elements in the 'demo_pg_index' list.

In [225]:
len(demo_pg_index)
Out[225]:
60

This is the Worldwide Gross of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe.

In [3]:
worldwide_pg = [9709597, 46918287,542351353,73986904,305937718,216601214,38102988,27118000,
            534816,37306334,47494916,19344615,38741732,114830111,43545364,18948425,3438735,
            137587063,64605762,33473297,89137047,8526288,64667874,106269971,35656130,3987768,
            7025496,152036382,171120329,13835130,14859394,134582776,6101815,
            63954968,10769960,32255440,15164458,127956187,2819485,43440294,17815212,157297525,
            35856053,119285432,40716963,14920781,3281232,14923752,125052686,549368315,6668025,199078,
            64892670,4786789,8443124,2044892,2400000,1705908,80008942,48000000]
print(worldwide_pg) #showing the worldwide_pg list 
[9709597, 46918287, 542351353, 73986904, 305937718, 216601214, 38102988, 27118000, 534816, 37306334, 47494916, 19344615, 38741732, 114830111, 43545364, 18948425, 3438735, 137587063, 64605762, 33473297, 89137047, 8526288, 64667874, 106269971, 35656130, 3987768, 7025496, 152036382, 171120329, 13835130, 14859394, 134582776, 6101815, 63954968, 10769960, 32255440, 15164458, 127956187, 2819485, 43440294, 17815212, 157297525, 35856053, 119285432, 40716963, 14920781, 3281232, 14923752, 125052686, 549368315, 6668025, 199078, 64892670, 4786789, 8443124, 2044892, 2400000, 1705908, 80008942, 48000000]

Checking the number of elements in the 'worldwide_pg' list.

In [73]:
len(worldwide_pg)
Out[73]:
60

This is the Domestic Gross of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe.

In [4]:
domestic_pg = [0,0,201151353,67790117,132422809,108101214,34700142,0,0,27796042,41281092,19161999,
              32751093,52330111,0,18848430,0,82272442,31664162,33456317,62950384,3493000,60705732,82569971,
              0,0,0,104636382,100920329,10162034,0,43182776,0,22954968,0,0,0,55956187,0,0,0,37686805,
              0,0,0,0,0,0,125049125,218815487,0,0,0,0,0,1537122,0,705908,80000000,0]
print(domestic_pg) #showing the domestic_pg list 
[0, 0, 201151353, 67790117, 132422809, 108101214, 34700142, 0, 0, 27796042, 41281092, 19161999, 32751093, 52330111, 0, 18848430, 0, 82272442, 31664162, 33456317, 62950384, 3493000, 60705732, 82569971, 0, 0, 0, 104636382, 100920329, 10162034, 0, 43182776, 0, 22954968, 0, 0, 0, 55956187, 0, 0, 0, 37686805, 0, 0, 0, 0, 0, 0, 125049125, 218815487, 0, 0, 0, 0, 0, 1537122, 0, 705908, 80000000, 0]

Checking the number of elements in the 'domestic_pg' list.

In [75]:
len(domestic_pg)
Out[75]:
60

This is the Foreign Gross of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe. This is calucated by subtracting Domestic Gross from the Worldwide Gross of each movie

In [227]:
foreign_pg = []
for i,x in enumerate(worldwide_pg):
    if domestic_pg[i] == 0:foreign_pg.append(0)
    else:foreign_pg.append(x-domestic_pg[i])
print(foreign_pg) #showing the foreign_pg list 
[0, 0, 341200000, 6196787, 173514909, 108500000, 3402846, 0, 0, 9510292, 6213824, 182616, 5990639, 62500000, 0, 99995, 0, 55314621, 32941600, 16980, 26186663, 5033288, 3962142, 23700000, 0, 0, 0, 47400000, 70200000, 3673096, 0, 91400000, 0, 41000000, 0, 0, 0, 72000000, 0, 0, 0, 119610720, 0, 0, 0, 0, 0, 0, 3561, 330552828, 0, 0, 0, 0, 0, 507770, 0, 1000000, 8942, 0]

Checking the number of elements in the 'foreign_pg' list.

In [77]:
len(foreign_pg)
Out[77]:
60

Creating the demo1_pg dataframe with the 59 new choosen rated 'PG' movies that will be added to the Drama_df dataframe

In [228]:
demo1_pg = demo_pg.iloc[demo_pg_index]

Resetting the index in the demo1_pg dataframe

In [229]:
demo1_pg = demo1_pg.reset_index(drop=True)

Checking the dataframe and getting the first five rows of the dataframe

In [230]:
demo1_pg.head()
Out[230]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Somewhere in Time PG Drama 1980 October 3, 1980 (United States) 7.2 27000.0 Jeannot Szwarc Richard Matheson Christopher Reeve United States 5100000.0 9709597.0 Rastar Pictures 103.0
1 Urban Cowboy PG Drama 1980 June 6, 1980 (United States) 6.4 14000.0 James Bridges Aaron Latham John Travolta United States NaN 46918287.0 Paramount Pictures 132.0
2 Cinderella PG Drama 2015 March 13, 2015 (United States) 6.9 165000.0 Kenneth Branagh Chris Weitz Lily James United States 95000000.0 542358331.0 Allison Shearmur Productions 105.0
3 War Room PG Drama 2015 August 28, 2015 (United States) 6.5 14000.0 Alex Kendrick Alex Kendrick Priscilla C. Shirer United States 3000000.0 73256266.0 FaithStep Films 120.0
4 Wonder PG Drama 2017 November 17, 2017 (United States) 8.0 150000.0 Stephen Chbosky Stephen Chbosky Jacob Tremblay United States 20000000.0 306209289.0 Lionsgate 113.0

The 'budget' column in the demo1_pg dataframe has 'NaN' elements in them. The cell below replacese all the 'NaN' in the demo1_pg dataframe with 0.

In [231]:
demo1_pg = demo1_pg.fillna(0)

Checking the dataframe and getting the first five rows of the dataframe

In [232]:
demo1_pg.head()
Out[232]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Somewhere in Time PG Drama 1980 October 3, 1980 (United States) 7.2 27000.0 Jeannot Szwarc Richard Matheson Christopher Reeve United States 5100000.0 9709597.0 Rastar Pictures 103.0
1 Urban Cowboy PG Drama 1980 June 6, 1980 (United States) 6.4 14000.0 James Bridges Aaron Latham John Travolta United States 0.0 46918287.0 Paramount Pictures 132.0
2 Cinderella PG Drama 2015 March 13, 2015 (United States) 6.9 165000.0 Kenneth Branagh Chris Weitz Lily James United States 95000000.0 542358331.0 Allison Shearmur Productions 105.0
3 War Room PG Drama 2015 August 28, 2015 (United States) 6.5 14000.0 Alex Kendrick Alex Kendrick Priscilla C. Shirer United States 3000000.0 73256266.0 FaithStep Films 120.0
4 Wonder PG Drama 2017 November 17, 2017 (United States) 8.0 150000.0 Stephen Chbosky Stephen Chbosky Jacob Tremblay United States 20000000.0 306209289.0 Lionsgate 113.0

Getting all the index of the '0' elements in demo1_pg dataframe

In [233]:
nan_index = []
for i,x in enumerate(demo1_pg.budget):
    if x == 0.0 :nan_index.append(i)
print(nan_index) #showing the nan_index list 
[1, 7, 16, 21, 37, 40, 41, 43, 45, 47, 49, 50, 53, 54, 55, 56, 57]

Checking the number of elements in the 'nan_index' list.

In [81]:
len(nan_index)
Out[81]:
17

The actual budget of the movies in the demo1_pg dataframe that was labaled '0'.

In [5]:
budget = [10000000.0, 422000, 9000000.0, 11000000.0, 20000000.0,
          5000000.0, 7000000.0, 15000000.0, 28300000.0, 7500000.0,
          5000000.0, 9000000.0, 5000000.0, 4500000.0, 4500000.0,
          8000000.0, 16000000.0]
print(budget) #showing the budget list 
[10000000.0, 422000, 9000000.0, 11000000.0, 20000000.0, 5000000.0, 7000000.0, 15000000.0, 28300000.0, 7500000.0, 5000000.0, 9000000.0, 5000000.0, 4500000.0, 4500000.0, 8000000.0, 16000000.0]

Replacing all the '0' elemnt in the demo1_pg dataframe with the actual budget of the movie

In [236]:
for i,x in enumerate(nan_index):
    demo1_pg.loc[x ,'budget'] = budget[i]

Checking the number of elements in the 'demo1_pg' list.

In [83]:
len(demo1_pg)
Out[83]:
60

This is the Profit of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.

In [238]:
profit_pg = []
for i,x in enumerate(worldwide_pg):
    profit_pg.append(x-demo1_pg.budget[i])
print(profit_pg) #showing the profit_pg list 
[4609597.0, 36918287.0, 447351353.0, 70986904.0, 285937718.0, 176601214.0, 33102988.0, 26696000.0, -4565184.0, -34693666.0, 35694916.0, 4344615.0, 6741732.0, 74830111.0, -21454636.0, 10948425.0, -5561265.0, 120587063.0, 34605762.0, 32973297.0, 69137047.0, -2473712.0, 62667874.0, 83269971.0, -9343870.0, -11012232.0, -2974504.0, 120036382.0, 81120329.0, 3835130.0, -12140606.0, 118582776.0, 3101815.0, 48954968.0, -14230040.0, -1744560.0, 5164458.0, 107956187.0, -12180515.0, 31440294.0, 12815212.0, 150297525.0, 21856053.0, 104285432.0, 28716963.0, -13379219.0, -4718768.0, 7423752.0, 108052686.0, 544368315.0, -2331975.0, -14800922.0, 42892670.0, -213211.0, 3943124.0, -2455108.0, -5600000.0, -14294092.0, 71808942.0, 20000000.0]

Checking the number of elements in the 'profit_pg' list.

In [85]:
len(profit_pg)
Out[85]:
60

This is the Number of Tickets Sold of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe. This was calculated by diving the Worldwide Gross with '10', which is the average ticket price worldwide.

In [239]:
no_tickets_pg = []
for i in worldwide_pg:
    no_tickets_pg.append(round(i/10))
print(no_tickets_pg) #showing the no_tickets_pg list 
[970960, 4691829, 54235135, 7398690, 30593772, 21660121, 3810299, 2711800, 53482, 3730633, 4749492, 1934462, 3874173, 11483011, 4354536, 1894842, 343874, 13758706, 6460576, 3347330, 8913705, 852629, 6466787, 10626997, 3565613, 398777, 702550, 15203638, 17112033, 1383513, 1485939, 13458278, 610182, 6395497, 1076996, 3225544, 1516446, 12795619, 281948, 4344029, 1781521, 15729752, 3585605, 11928543, 4071696, 1492078, 328123, 1492375, 12505269, 54936832, 666802, 19908, 6489267, 478679, 844312, 204489, 240000, 170591, 8000894, 4800000]

Checking the number of elements in the 'no_tickets_pg' list.

In [86]:
len(no_tickets_pg)
Out[86]:
60

After creating the columns, they are then added to the 'PG' rated movie dataframe demo1_pg that will be later added to the Drama_df dataframe.

In [240]:
demo1_pg['Worldwide_Gross'] = worldwide_pg
demo1_pg["Foreign_Gross"] = foreign_pg
demo1_pg['Domestic_Gross'] = domestic_pg
demo1_pg["Profit"] = profit_pg
demo1_pg['Tickets'] = no_tickets_pg

Showing the first five rows of the 'PG' rated movie dataframe demo1_pg showing the new coulmns added.

In [241]:
demo1_pg.head()
Out[241]:
movie rating genre year released score votes director writer star country budget gross company runtime Worldwide_Gross Foreign_Gross Domestic_Gross Profit Tickets
0 Somewhere in Time PG Drama 1980 October 3, 1980 (United States) 7.2 27000.0 Jeannot Szwarc Richard Matheson Christopher Reeve United States 5100000.0 9709597.0 Rastar Pictures 103.0 9709597 0 0 4609597.0 970960
1 Urban Cowboy PG Drama 1980 June 6, 1980 (United States) 6.4 14000.0 James Bridges Aaron Latham John Travolta United States 10000000.0 46918287.0 Paramount Pictures 132.0 46918287 0 0 36918287.0 4691829
2 Cinderella PG Drama 2015 March 13, 2015 (United States) 6.9 165000.0 Kenneth Branagh Chris Weitz Lily James United States 95000000.0 542358331.0 Allison Shearmur Productions 105.0 542351353 341200000 201151353 447351353.0 54235135
3 War Room PG Drama 2015 August 28, 2015 (United States) 6.5 14000.0 Alex Kendrick Alex Kendrick Priscilla C. Shirer United States 3000000.0 73256266.0 FaithStep Films 120.0 73986904 6196787 67790117 70986904.0 7398690
4 Wonder PG Drama 2017 November 17, 2017 (United States) 8.0 150000.0 Stephen Chbosky Stephen Chbosky Jacob Tremblay United States 20000000.0 306209289.0 Lionsgate 113.0 305937718 173514909 132422809 285937718.0 30593772

Creating the Foreign_Gross_x column by turning the foreign gross column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.

In [242]:
foreign_gross_pgx = []
for i in foreign_pg:
    foreign_gross_pgx.append("${:,.0f}".format(i))
print(foreign_gross_pgx) #showing the foreign_gross_pgx list 
['$0', '$0', '$341,200,000', '$6,196,787', '$173,514,909', '$108,500,000', '$3,402,846', '$0', '$0', '$9,510,292', '$6,213,824', '$182,616', '$5,990,639', '$62,500,000', '$0', '$99,995', '$0', '$55,314,621', '$32,941,600', '$16,980', '$26,186,663', '$5,033,288', '$3,962,142', '$23,700,000', '$0', '$0', '$0', '$47,400,000', '$70,200,000', '$3,673,096', '$0', '$91,400,000', '$0', '$41,000,000', '$0', '$0', '$0', '$72,000,000', '$0', '$0', '$0', '$119,610,720', '$0', '$0', '$0', '$0', '$0', '$0', '$3,561', '$330,552,828', '$0', '$0', '$0', '$0', '$0', '$507,770', '$0', '$1,000,000', '$8,942', '$0']

Checking the number of elements in the 'foreign_gross_pgx' list.

In [89]:
len(foreign_gross_pgx)
Out[89]:
60

Creating the Worldwide_Gross_x column by turning the worldwide gross column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.

In [243]:
worldwide_gross_pgx = []
for i in worldwide_pg:
    worldwide_gross_pgx.append("${:,.0f}".format(i))
print(worldwide_gross_pgx) #showing the worldwide_gross_pgx list 
['$9,709,597', '$46,918,287', '$542,351,353', '$73,986,904', '$305,937,718', '$216,601,214', '$38,102,988', '$27,118,000', '$534,816', '$37,306,334', '$47,494,916', '$19,344,615', '$38,741,732', '$114,830,111', '$43,545,364', '$18,948,425', '$3,438,735', '$137,587,063', '$64,605,762', '$33,473,297', '$89,137,047', '$8,526,288', '$64,667,874', '$106,269,971', '$35,656,130', '$3,987,768', '$7,025,496', '$152,036,382', '$171,120,329', '$13,835,130', '$14,859,394', '$134,582,776', '$6,101,815', '$63,954,968', '$10,769,960', '$32,255,440', '$15,164,458', '$127,956,187', '$2,819,485', '$43,440,294', '$17,815,212', '$157,297,525', '$35,856,053', '$119,285,432', '$40,716,963', '$14,920,781', '$3,281,232', '$14,923,752', '$125,052,686', '$549,368,315', '$6,668,025', '$199,078', '$64,892,670', '$4,786,789', '$8,443,124', '$2,044,892', '$2,400,000', '$1,705,908', '$80,008,942', '$48,000,000']

Checking the number of elements in the 'worldwide_gross_pgx' list.

In [90]:
len(worldwide_gross_pgx)
Out[90]:
60

Creating the Domestic_Gross_x column by turning the domestic gross column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.

In [244]:
domestic_gross_pgx = []
for i in domestic_pg:
    domestic_gross_pgx.append("${:,.0f}".format(i))
print(domestic_gross_pgx) #showing the domestic_gross_pgx list 
['$0', '$0', '$201,151,353', '$67,790,117', '$132,422,809', '$108,101,214', '$34,700,142', '$0', '$0', '$27,796,042', '$41,281,092', '$19,161,999', '$32,751,093', '$52,330,111', '$0', '$18,848,430', '$0', '$82,272,442', '$31,664,162', '$33,456,317', '$62,950,384', '$3,493,000', '$60,705,732', '$82,569,971', '$0', '$0', '$0', '$104,636,382', '$100,920,329', '$10,162,034', '$0', '$43,182,776', '$0', '$22,954,968', '$0', '$0', '$0', '$55,956,187', '$0', '$0', '$0', '$37,686,805', '$0', '$0', '$0', '$0', '$0', '$0', '$125,049,125', '$218,815,487', '$0', '$0', '$0', '$0', '$0', '$1,537,122', '$0', '$705,908', '$80,000,000', '$0']

Checking the number of elements in the 'domestic_gross_pgx' list.

In [245]:
len(domestic_gross_pgx)
Out[245]:
60

Creating the Profit_x column by turning the profit column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.

In [246]:
profit_pgx = []
for i in profit_pg:
    profit_pgx.append("${:,.0f}".format(i))
print(profit_pgx) #showing the profit_pgx list 
['$4,609,597', '$36,918,287', '$447,351,353', '$70,986,904', '$285,937,718', '$176,601,214', '$33,102,988', '$26,696,000', '$-4,565,184', '$-34,693,666', '$35,694,916', '$4,344,615', '$6,741,732', '$74,830,111', '$-21,454,636', '$10,948,425', '$-5,561,265', '$120,587,063', '$34,605,762', '$32,973,297', '$69,137,047', '$-2,473,712', '$62,667,874', '$83,269,971', '$-9,343,870', '$-11,012,232', '$-2,974,504', '$120,036,382', '$81,120,329', '$3,835,130', '$-12,140,606', '$118,582,776', '$3,101,815', '$48,954,968', '$-14,230,040', '$-1,744,560', '$5,164,458', '$107,956,187', '$-12,180,515', '$31,440,294', '$12,815,212', '$150,297,525', '$21,856,053', '$104,285,432', '$28,716,963', '$-13,379,219', '$-4,718,768', '$7,423,752', '$108,052,686', '$544,368,315', '$-2,331,975', '$-14,800,922', '$42,892,670', '$-213,211', '$3,943,124', '$-2,455,108', '$-5,600,000', '$-14,294,092', '$71,808,942', '$20,000,000']

Checking the number of elements in the 'profit_pgx' list.

In [92]:
len(profit_pgx)
Out[92]:
60

Creating the Tickets_x column by turning the tickets column from demo1_pg dataframe into a string. To be added to the demo1_pg dataframe.

In [247]:
str_tickets_pgx = []
for i in no_tickets_pg:
    str_tickets_pgx.append("{:,.0f}".format(i))
print(str_tickets_pgx) #showing the str_tickets_pgx list 
['970,960', '4,691,829', '54,235,135', '7,398,690', '30,593,772', '21,660,121', '3,810,299', '2,711,800', '53,482', '3,730,633', '4,749,492', '1,934,462', '3,874,173', '11,483,011', '4,354,536', '1,894,842', '343,874', '13,758,706', '6,460,576', '3,347,330', '8,913,705', '852,629', '6,466,787', '10,626,997', '3,565,613', '398,777', '702,550', '15,203,638', '17,112,033', '1,383,513', '1,485,939', '13,458,278', '610,182', '6,395,497', '1,076,996', '3,225,544', '1,516,446', '12,795,619', '281,948', '4,344,029', '1,781,521', '15,729,752', '3,585,605', '11,928,543', '4,071,696', '1,492,078', '328,123', '1,492,375', '12,505,269', '54,936,832', '666,802', '19,908', '6,489,267', '478,679', '844,312', '204,489', '240,000', '170,591', '8,000,894', '4,800,000']

Checking the number of elements in the 'str_tickets_pgx' list.

In [93]:
len(str_tickets_pgx)
Out[93]:
60

Creating the Production_Budget_x column by turning the budget column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.

In [248]:
str_budget_pgx = []
for i in demo1_pg.budget:
    str_budget_pgx.append("${:,.0f}".format(i))
print(str_budget_pgx) #showing the str_budget_pgx list 
['$5,100,000', '$10,000,000', '$95,000,000', '$3,000,000', '$20,000,000', '$40,000,000', '$5,000,000', '$422,000', '$5,100,000', '$72,000,000', '$11,800,000', '$15,000,000', '$32,000,000', '$40,000,000', '$65,000,000', '$8,000,000', '$9,000,000', '$17,000,000', '$30,000,000', '$500,000', '$20,000,000', '$11,000,000', '$2,000,000', '$23,000,000', '$45,000,000', '$15,000,000', '$10,000,000', '$32,000,000', '$90,000,000', '$10,000,000', '$27,000,000', '$16,000,000', '$3,000,000', '$15,000,000', '$25,000,000', '$34,000,000', '$10,000,000', '$20,000,000', '$15,000,000', '$12,000,000', '$5,000,000', '$7,000,000', '$14,000,000', '$15,000,000', '$12,000,000', '$28,300,000', '$8,000,000', '$7,500,000', '$17,000,000', '$5,000,000', '$9,000,000', '$15,000,000', '$22,000,000', '$5,000,000', '$4,500,000', '$4,500,000', '$8,000,000', '$16,000,000', '$8,200,000', '$28,000,000']

Checking the number of elements in the 'str_budget_pgx' list.

In [94]:
len(str_budget_pgx)
Out[94]:
60

After creating more columns, they are then added to the 'PG' rated movie dataframe demo1_pg that will be later added to the Drama_df dataframe.

In [249]:
demo1_pg['Worldwide_Gross_x'] = worldwide_gross_pgx
demo1_pg["Foreign_Gross_x"] = foreign_gross_pgx
demo1_pg['Domestic_Gross_x'] = domestic_gross_pgx
demo1_pg["Profit_x"] = profit_pgx
demo1_pg['Tickets_x'] = str_tickets_pgx
demo1_pg['Production_Budget_x'] = str_budget_pgx

Showing the first five rows of the 'PG' rated dataframe demo1_pg showing the new coulmns added.

In [250]:
demo1_pg.head()
Out[250]:
movie rating genre year released score votes director writer star ... Foreign_Gross Domestic_Gross Profit Tickets Worldwide_Gross_x Foreign_Gross_x Domestic_Gross_x Profit_x Tickets_x Production_Budget_x
0 Somewhere in Time PG Drama 1980 October 3, 1980 (United States) 7.2 27000.0 Jeannot Szwarc Richard Matheson Christopher Reeve ... 0 0 4609597.0 970960 $9,709,597 $0 $0 $4,609,597 970,960 $5,100,000
1 Urban Cowboy PG Drama 1980 June 6, 1980 (United States) 6.4 14000.0 James Bridges Aaron Latham John Travolta ... 0 0 36918287.0 4691829 $46,918,287 $0 $0 $36,918,287 4,691,829 $10,000,000
2 Cinderella PG Drama 2015 March 13, 2015 (United States) 6.9 165000.0 Kenneth Branagh Chris Weitz Lily James ... 341200000 201151353 447351353.0 54235135 $542,351,353 $341,200,000 $201,151,353 $447,351,353 54,235,135 $95,000,000
3 War Room PG Drama 2015 August 28, 2015 (United States) 6.5 14000.0 Alex Kendrick Alex Kendrick Priscilla C. Shirer ... 6196787 67790117 70986904.0 7398690 $73,986,904 $6,196,787 $67,790,117 $70,986,904 7,398,690 $3,000,000
4 Wonder PG Drama 2017 November 17, 2017 (United States) 8.0 150000.0 Stephen Chbosky Stephen Chbosky Jacob Tremblay ... 173514909 132422809 285937718.0 30593772 $305,937,718 $173,514,909 $132,422,809 $285,937,718 30,593,772 $20,000,000

5 rows × 26 columns

Showing all the information of the demo1_pg dataframe after adding the new columns.

In [251]:
demo1_pg.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 26 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   movie                60 non-null     object 
 1   rating               60 non-null     object 
 2   genre                60 non-null     object 
 3   year                 60 non-null     int64  
 4   released             60 non-null     object 
 5   score                60 non-null     float64
 6   votes                60 non-null     float64
 7   director             60 non-null     object 
 8   writer               60 non-null     object 
 9   star                 60 non-null     object 
 10  country              60 non-null     object 
 11  budget               60 non-null     float64
 12  gross                60 non-null     float64
 13  company              60 non-null     object 
 14  runtime              60 non-null     float64
 15  Worldwide_Gross      60 non-null     int64  
 16  Foreign_Gross        60 non-null     int64  
 17  Domestic_Gross       60 non-null     int64  
 18  Profit               60 non-null     float64
 19  Tickets              60 non-null     int64  
 20  Worldwide_Gross_x    60 non-null     object 
 21  Foreign_Gross_x      60 non-null     object 
 22  Domestic_Gross_x     60 non-null     object 
 23  Profit_x             60 non-null     object 
 24  Tickets_x            60 non-null     object 
 25  Production_Budget_x  60 non-null     object 
dtypes: float64(6), int64(5), object(15)
memory usage: 12.3+ KB

Showing all the information of the Drama_df dataframe to make sure the demo1_pg dataframe coulmns allign with the Drama_df dataframe columns to be able to append both dataframes to eachother.

In [98]:
Drama_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154 entries, 0 to 153
Data columns (total 23 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Movie                154 non-null    object 
 1   Release_Date         154 non-null    object 
 2   Genre                154 non-null    object 
 3   Rating               154 non-null    object 
 4   Production_Budget    154 non-null    int64  
 5   Production_Budget_x  154 non-null    object 
 6   Domestic_Gross       154 non-null    int64  
 7   Domestic_Gross_x     154 non-null    object 
 8   Foreign_Gross        127 non-null    float64
 9   Foreign_Gross_x      127 non-null    object 
 10  Worldwide_Gross      154 non-null    object 
 11  Worldwide_Gross_x    154 non-null    int64  
 12  Profit               154 non-null    int64  
 13  Profit_x             154 non-null    object 
 14  Tickets              154 non-null    int64  
 15  Tickets_x            154 non-null    object 
 16  Runtime              153 non-null    float64
 17  Averagerating        154 non-null    float64
 18  Company              153 non-null    object 
 19  Studio               154 non-null    object 
 20  Star                 154 non-null    object 
 21  Director             154 non-null    object 
 22  Writer               154 non-null    object 
dtypes: float64(3), int64(5), object(15)
memory usage: 27.8+ KB

Deleting four coulmns form the demo1_pg dataframe to align with the Drama_df dataframe.

In [252]:
demo1_pg = demo1_pg.drop(['year',  'votes', 'country', 'gross'], axis=1)

Checking the dataframe and getting the first five rows of the dataframe

In [253]:
demo1_pg.head()
Out[253]:
movie rating genre released score director writer star budget company ... Foreign_Gross Domestic_Gross Profit Tickets Worldwide_Gross_x Foreign_Gross_x Domestic_Gross_x Profit_x Tickets_x Production_Budget_x
0 Somewhere in Time PG Drama October 3, 1980 (United States) 7.2 Jeannot Szwarc Richard Matheson Christopher Reeve 5100000.0 Rastar Pictures ... 0 0 4609597.0 970960 $9,709,597 $0 $0 $4,609,597 970,960 $5,100,000
1 Urban Cowboy PG Drama June 6, 1980 (United States) 6.4 James Bridges Aaron Latham John Travolta 10000000.0 Paramount Pictures ... 0 0 36918287.0 4691829 $46,918,287 $0 $0 $36,918,287 4,691,829 $10,000,000
2 Cinderella PG Drama March 13, 2015 (United States) 6.9 Kenneth Branagh Chris Weitz Lily James 95000000.0 Allison Shearmur Productions ... 341200000 201151353 447351353.0 54235135 $542,351,353 $341,200,000 $201,151,353 $447,351,353 54,235,135 $95,000,000
3 War Room PG Drama August 28, 2015 (United States) 6.5 Alex Kendrick Alex Kendrick Priscilla C. Shirer 3000000.0 FaithStep Films ... 6196787 67790117 70986904.0 7398690 $73,986,904 $6,196,787 $67,790,117 $70,986,904 7,398,690 $3,000,000
4 Wonder PG Drama November 17, 2017 (United States) 8.0 Stephen Chbosky Stephen Chbosky Jacob Tremblay 20000000.0 Lionsgate ... 173514909 132422809 285937718.0 30593772 $305,937,718 $173,514,909 $132,422,809 $285,937,718 30,593,772 $20,000,000

5 rows × 22 columns

Showing all the information of the demo1_pg dataframe, making sure the four coulmns were deleted.

In [101]:
demo1_pg.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   movie                60 non-null     object 
 1   rating               60 non-null     object 
 2   genre                60 non-null     object 
 3   released             60 non-null     object 
 4   score                60 non-null     float64
 5   director             60 non-null     object 
 6   writer               60 non-null     object 
 7   star                 60 non-null     object 
 8   budget               60 non-null     float64
 9   company              60 non-null     object 
 10  runtime              60 non-null     float64
 11  Worldwide_Gross      60 non-null     int64  
 12  Foreign_Gross        60 non-null     int64  
 13  Domestic_Gross       60 non-null     int64  
 14  Profit               60 non-null     float64
 15  Tickets              60 non-null     int64  
 16  Worldwide_Gross_x    60 non-null     object 
 17  Foreign_Gross_x      60 non-null     object 
 18  Domestic_Gross_x     60 non-null     object 
 19  Profit_x             60 non-null     object 
 20  Tickets_x            60 non-null     object 
 21  Production_Budget_x  60 non-null     object 
dtypes: float64(4), int64(4), object(14)
memory usage: 10.4+ KB

Rearranging the columns in demo1_pg dataframe to align with Drama_df dataframe to be suitable for appending

In [254]:
demo1_pg = demo1_pg[['movie','released','genre','rating','budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
           'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','runtime','score',
           'company','star','director','writer']]

Checking the dataframe and getting the first five rows of the dataframe

In [255]:
demo1_pg.head()
Out[255]:
movie released genre rating budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x runtime score company star director writer
0 Somewhere in Time October 3, 1980 (United States) Drama PG 5100000.0 $5,100,000 0 $0 0 $0 ... 4609597.0 $4,609,597 970960 970,960 103.0 7.2 Rastar Pictures Christopher Reeve Jeannot Szwarc Richard Matheson
1 Urban Cowboy June 6, 1980 (United States) Drama PG 10000000.0 $10,000,000 0 $0 0 $0 ... 36918287.0 $36,918,287 4691829 4,691,829 132.0 6.4 Paramount Pictures John Travolta James Bridges Aaron Latham
2 Cinderella March 13, 2015 (United States) Drama PG 95000000.0 $95,000,000 201151353 $201,151,353 341200000 $341,200,000 ... 447351353.0 $447,351,353 54235135 54,235,135 105.0 6.9 Allison Shearmur Productions Lily James Kenneth Branagh Chris Weitz
3 War Room August 28, 2015 (United States) Drama PG 3000000.0 $3,000,000 67790117 $67,790,117 6196787 $6,196,787 ... 70986904.0 $70,986,904 7398690 7,398,690 120.0 6.5 FaithStep Films Priscilla C. Shirer Alex Kendrick Alex Kendrick
4 Wonder November 17, 2017 (United States) Drama PG 20000000.0 $20,000,000 132422809 $132,422,809 173514909 $173,514,909 ... 285937718.0 $285,937,718 30593772 30,593,772 113.0 8.0 Lionsgate Jacob Tremblay Stephen Chbosky Stephen Chbosky

5 rows × 22 columns

Renaming the columns in the demo1_pg dataframe to align with Drama_df dataframe to be suitable for appending

In [256]:
demo1_pg.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
           'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
            'Company','Star','Director','Writer']

Checking the dataframe and getting the first five rows of the dataframe

In [257]:
demo1_pg.head()
Out[257]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Somewhere in Time October 3, 1980 (United States) Drama PG 5100000.0 $5,100,000 0 $0 0 $0 ... 4609597.0 $4,609,597 970960 970,960 103.0 7.2 Rastar Pictures Christopher Reeve Jeannot Szwarc Richard Matheson
1 Urban Cowboy June 6, 1980 (United States) Drama PG 10000000.0 $10,000,000 0 $0 0 $0 ... 36918287.0 $36,918,287 4691829 4,691,829 132.0 6.4 Paramount Pictures John Travolta James Bridges Aaron Latham
2 Cinderella March 13, 2015 (United States) Drama PG 95000000.0 $95,000,000 201151353 $201,151,353 341200000 $341,200,000 ... 447351353.0 $447,351,353 54235135 54,235,135 105.0 6.9 Allison Shearmur Productions Lily James Kenneth Branagh Chris Weitz
3 War Room August 28, 2015 (United States) Drama PG 3000000.0 $3,000,000 67790117 $67,790,117 6196787 $6,196,787 ... 70986904.0 $70,986,904 7398690 7,398,690 120.0 6.5 FaithStep Films Priscilla C. Shirer Alex Kendrick Alex Kendrick
4 Wonder November 17, 2017 (United States) Drama PG 20000000.0 $20,000,000 132422809 $132,422,809 173514909 $173,514,909 ... 285937718.0 $285,937,718 30593772 30,593,772 113.0 8.0 Lionsgate Jacob Tremblay Stephen Chbosky Stephen Chbosky

5 rows × 22 columns

Dropping the 'Studio' column from the Drama_df dataframe to align to the demo1_pg dataframe, to be suitable for appending both dataframes.

In [258]:
Drama_df = Drama_df.drop(['Studio'], axis=1)

Making sure the 'Studio' coulmn was dropped from the Drama_df dataframe

In [259]:
Drama_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154 entries, 0 to 153
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Movie                154 non-null    object 
 1   Release_Date         154 non-null    object 
 2   Genre                154 non-null    object 
 3   Rating               154 non-null    object 
 4   Production_Budget    154 non-null    int64  
 5   Production_Budget_x  154 non-null    object 
 6   Domestic_Gross       154 non-null    int64  
 7   Domestic_Gross_x     154 non-null    object 
 8   Foreign_Gross        127 non-null    float64
 9   Foreign_Gross_x      127 non-null    object 
 10  Worldwide_Gross      154 non-null    object 
 11  Worldwide_Gross_x    154 non-null    int64  
 12  Profit               154 non-null    int64  
 13  Profit_x             154 non-null    object 
 14  Tickets              154 non-null    int64  
 15  Tickets_x            154 non-null    object 
 16  Runtime              153 non-null    float64
 17  Averagerating        154 non-null    float64
 18  Company              153 non-null    object 
 19  Star                 154 non-null    object 
 20  Director             154 non-null    object 
 21  Writer               154 non-null    object 
dtypes: float64(3), int64(5), object(14)
memory usage: 26.6+ KB

Checking the demo1_pg dataframe to make sure it alligns with the Drama_df dataframe

In [260]:
demo1_pg.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Movie                60 non-null     object 
 1   Release_Date         60 non-null     object 
 2   Genre                60 non-null     object 
 3   Rating               60 non-null     object 
 4   Production_Budget    60 non-null     float64
 5   Production_Budget_x  60 non-null     object 
 6   Domestic_Gross       60 non-null     int64  
 7   Domestic_Gross_x     60 non-null     object 
 8   Foreign_Gross        60 non-null     int64  
 9   Foreign_Gross_x      60 non-null     object 
 10  Worldwide_Gross      60 non-null     int64  
 11  Worldwide_Gross_x    60 non-null     object 
 12  Profit               60 non-null     float64
 13  Profit_x             60 non-null     object 
 14  Tickets              60 non-null     int64  
 15  Tickets_x            60 non-null     object 
 16  Runtime              60 non-null     float64
 17  Averagerating        60 non-null     float64
 18  Company              60 non-null     object 
 19  Star                 60 non-null     object 
 20  Director             60 non-null     object 
 21  Writer               60 non-null     object 
dtypes: float64(4), int64(4), object(14)
memory usage: 10.4+ KB

Appending the demo1_pg dataframe to the Drama_df dataframe. To create the demo_drama dataframe. The rest of the system rating 'R', 'G' and NC-17' will be added to the demo_drama datafrane to complete the dataframe for this analysis.

In [261]:
demo_drama = Drama_df.append(demo1_pg, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1081320491.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  demo_drama = Drama_df.append(demo1_pg, ignore_index=True)

Checking demo_drama dataframe.

In [262]:
demo_drama
Out[262]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 ... 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 ... -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 ... 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 ... 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 ... 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
209 Testament January 5, 1984 (Argentina) Drama PG 4500000.0 $4,500,000 1537122 $1,537,122 507770.0 $507,770 ... -2455108.0 $-2,455,108 204489 204,489 90.0 7.0 Paramount Pictures Jane Alexander Lynne Littman Carol Amen
210 Table for Five March 10, 1983 (Australia) Drama PG 8000000.0 $8,000,000 0 $0 0.0 $0 ... -5600000.0 $-5,600,000 240000 240,000 122.0 6.1 CBS Theatrical Films Jon Voight Robert Lieberman David Seltzer
211 Man, Woman and Child September 7, 1983 (France) Drama PG 16000000.0 $16,000,000 705908 $705,908 1000000.0 $1,000,000 ... -14294092.0 $-14,294,092 170591 170,591 99.0 6.1 Gaylord Productions Martin Sheen Dick Richards Erich Segal
212 Footloose February 17, 1984 (United States) Drama PG 8200000.0 $8,200,000 80000000 $80,000,000 8942.0 $8,942 ... 71808942.0 $71,808,942 8000894 8,000,894 107.0 6.6 Paramount Pictures Kevin Bacon Herbert Ross Dean Pitchford
213 The Natural May 11, 1984 (United States) Drama PG 28000000.0 $28,000,000 0 $0 0.0 $0 ... 20000000.0 $20,000,000 4800000 4,800,000 138.0 7.5 TriStar Pictures Robert Redford Barry Levinson Bernard Malamud

214 rows × 22 columns

Merging the 'G' rated dataframe demo_g with the 'R' rated dataframe demo_r to the 'NC-17' rated dataframe demo_nc to each other, naming the dataframe demo_rest. This is to get the rest of the new movies that wil be added to the demo_drama dataframe, to complete the Drama dataframe for this analysis.

In [263]:
demo_rest = demo_nc.append(demo_g, ignore_index=True).append(demo_r, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1901008892.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  demo_rest = demo_nc.append(demo_g, ignore_index=True).append(demo_r, ignore_index=True)

Checking the dataframe and getting the first five rows of the dataframe

In [264]:
demo_rest.head()
Out[264]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Matador NC-17 Drama 1986 March 7, 1986 (Spain) 7.0 11000.0 Pedro Almodóvar Pedro Almodóvar Assumpta Serna Spain NaN 286126.0 Compañía Iberoamericana de TV 110.0
1 Whore NC-17 Drama 1991 October 18, 1991 (United States) 5.6 3500.0 Ken Russell David Hines Theresa Russell United States NaN 1008404.0 Cheap Date 85.0
2 Tokyo Decadence NC-17 Drama 1992 April 30, 1993 (United States) 6.0 3000.0 Ryû Murakami Ryû Murakami Miho Nikaido Japan NaN 277845.0 Cinemabrain 112.0
3 Wide Sargasso Sea NC-17 Drama 1993 April 16, 1993 (United States) 5.7 1900.0 John Duigan Jan Sharp Karina Lombard Australia NaN 1614784.0 Laughing Kookaburra Productions 98.0
4 Kids NC-17 Drama 1995 September 1, 1995 (United States) 7.1 75000.0 Larry Clark Harmony Korine Leo Fitzpatrick United States 1500000.0 7412216.0 Guys Upstairs 91.0

Droping the index that had 'NaN' values in the budget column of the demo_rest, that didnt have any data online on the the budget of that movie. And also resetting the demo_rest dataframe after dropping the indexx.

In [265]:
demo_rest = demo_rest.drop(labels=[12,16,19,23,27], axis=0)
demo_rest = demo_rest.reset_index(drop=True)

Checking the dataframe and getting the first five rows of the dataframe

In [266]:
demo_rest.head()
Out[266]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Matador NC-17 Drama 1986 March 7, 1986 (Spain) 7.0 11000.0 Pedro Almodóvar Pedro Almodóvar Assumpta Serna Spain NaN 286126.0 Compañía Iberoamericana de TV 110.0
1 Whore NC-17 Drama 1991 October 18, 1991 (United States) 5.6 3500.0 Ken Russell David Hines Theresa Russell United States NaN 1008404.0 Cheap Date 85.0
2 Tokyo Decadence NC-17 Drama 1992 April 30, 1993 (United States) 6.0 3000.0 Ryû Murakami Ryû Murakami Miho Nikaido Japan NaN 277845.0 Cinemabrain 112.0
3 Wide Sargasso Sea NC-17 Drama 1993 April 16, 1993 (United States) 5.7 1900.0 John Duigan Jan Sharp Karina Lombard Australia NaN 1614784.0 Laughing Kookaburra Productions 98.0
4 Kids NC-17 Drama 1995 September 1, 1995 (United States) 7.1 75000.0 Larry Clark Harmony Korine Leo Fitzpatrick United States 1500000.0 7412216.0 Guys Upstairs 91.0

The 'budget' column in the demo_rest dataframe has 'NaN' elements in them. The cell below replacese all the 'NaN' in the demo_rest dataframe with '0'. To replace the '0' with their actual budget later on.

In [267]:
demo_rest = demo_rest.fillna(0)

Checking the dataframe and getting the first five rows of the dataframe

In [268]:
demo_rest.head()
Out[268]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Matador NC-17 Drama 1986 March 7, 1986 (Spain) 7.0 11000.0 Pedro Almodóvar Pedro Almodóvar Assumpta Serna Spain 0.0 286126.0 Compañía Iberoamericana de TV 110.0
1 Whore NC-17 Drama 1991 October 18, 1991 (United States) 5.6 3500.0 Ken Russell David Hines Theresa Russell United States 0.0 1008404.0 Cheap Date 85.0
2 Tokyo Decadence NC-17 Drama 1992 April 30, 1993 (United States) 6.0 3000.0 Ryû Murakami Ryû Murakami Miho Nikaido Japan 0.0 277845.0 Cinemabrain 112.0
3 Wide Sargasso Sea NC-17 Drama 1993 April 16, 1993 (United States) 5.7 1900.0 John Duigan Jan Sharp Karina Lombard Australia 0.0 1614784.0 Laughing Kookaburra Productions 98.0
4 Kids NC-17 Drama 1995 September 1, 1995 (United States) 7.1 75000.0 Larry Clark Harmony Korine Leo Fitzpatrick United States 1500000.0 7412216.0 Guys Upstairs 91.0

Getting all the index of the '0' elements in demo_rest dataframe

In [269]:
nan_index = []
for i,x in enumerate(demo_rest.budget):
    if x == 0.0 :nan_index.append(i)
print(nan_index) #showing the nan_index list 
[0, 1, 2, 3, 7, 9, 12, 13, 14, 15, 16, 18, 22, 23, 24, 26, 29, 30]

Checking the number of elements in the 'nan_index' list.

In [270]:
len(nan_index)
Out[270]:
18

The actual budget of the movies in the demo_rest dataframe that was labaled '0'.

In [6]:
budget = [12500000, 1000000, 20000, 955472, 5000000, 2734384,
         4000000, 35446775, 700000, 8600000, 7000000, 4400000,
         8500000, 20000000, 100000, 6500000, 11500000, 9000000]
print(budget) #showing the budget list 
[12500000, 1000000, 20000, 955472, 5000000, 2734384, 4000000, 35446775, 700000, 8600000, 7000000, 4400000, 8500000, 20000000, 100000, 6500000, 11500000, 9000000]

Replacing all the '0' elemnt in the demo_rest dataframe with the actual budget of the movie

In [273]:
for i,x in enumerate(nan_index):
    demo_rest.loc[x ,'budget'] = budget[i]

Showing the first five rows of the 'G', 'R' and 'NC-17' rated movie dataframe demo_rest showing the new data replacing the '0' in the budget coulmn.

In [274]:
demo_rest.head()
Out[274]:
movie rating genre year released score votes director writer star country budget gross company runtime
0 Matador NC-17 Drama 1986 March 7, 1986 (Spain) 7.0 11000.0 Pedro Almodóvar Pedro Almodóvar Assumpta Serna Spain 12500000.0 286126.0 Compañía Iberoamericana de TV 110.0
1 Whore NC-17 Drama 1991 October 18, 1991 (United States) 5.6 3500.0 Ken Russell David Hines Theresa Russell United States 1000000.0 1008404.0 Cheap Date 85.0
2 Tokyo Decadence NC-17 Drama 1992 April 30, 1993 (United States) 6.0 3000.0 Ryû Murakami Ryû Murakami Miho Nikaido Japan 20000.0 277845.0 Cinemabrain 112.0
3 Wide Sargasso Sea NC-17 Drama 1993 April 16, 1993 (United States) 5.7 1900.0 John Duigan Jan Sharp Karina Lombard Australia 955472.0 1614784.0 Laughing Kookaburra Productions 98.0
4 Kids NC-17 Drama 1995 September 1, 1995 (United States) 7.1 75000.0 Larry Clark Harmony Korine Leo Fitzpatrick United States 1500000.0 7412216.0 Guys Upstairs 91.0

This is the Worldwide Gross of the new rated 'G', 'R' and 'NC-17 movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis..

In [7]:
worldwide_rest = [17356268, 1008404, 277845, 1614784, 20412216, 20350754, 98410061, 496059,
                 15121165, 1022148, 67091915, 20412841, 19465835, 195494, 2411143,
                 1025228, 18587135, 8721243, 40300, 10015449, 80693537,
                 54766923, 77211836, 34718173, 1951683, 636796, 2447576, 9171289, 
                 3256082, 13000000, 11000000]
print(worldwide_rest) #showing the worldwide_rest list 
[17356268, 1008404, 277845, 1614784, 20412216, 20350754, 98410061, 496059, 15121165, 1022148, 67091915, 20412841, 19465835, 195494, 2411143, 1025228, 18587135, 8721243, 40300, 10015449, 80693537, 54766923, 77211836, 34718173, 1951683, 636796, 2447576, 9171289, 3256082, 13000000, 11000000]

Checking the number of elements in the 'worldwide_rest' list.

In [276]:
len(worldwide_rest)
Out[276]:
31

This is the Domestic Gross of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis.

In [8]:
domestic_rest = [12594698, 0, 0, 0, 7412216, 0, 54580300, 0, 2532228, 71616, 4604982,  
                 4002293, 2199787, 0, 0, 0, 0, 0, 0, 0, 75600072, 0, 22455510, 23438250, 
                 1596371, 0, 0, 0, 0, 0, 0]
print(domestic_rest) #showing the domestic_rest list 
[12594698, 0, 0, 0, 7412216, 0, 54580300, 0, 2532228, 71616, 4604982, 4002293, 2199787, 0, 0, 0, 0, 0, 0, 0, 75600072, 0, 22455510, 23438250, 1596371, 0, 0, 0, 0, 0, 0]

Checking the number of elements in the 'domestic_rest' list.

In [121]:
len(domestic_rest)
Out[121]:
31

This is the Foreign Gross of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis. This is calucated by subtracting Domestic Gross from the Worldwide Gross of each movie

In [278]:
foreign_rest = []
for i,x in enumerate(worldwide_rest):
    if domestic_rest[i] == 0:foreign_rest.append(0)
    else:foreign_rest.append(x-domestic_rest[i])
print(foreign_rest) #showing the foreign_rest list 
[4761570, 0, 0, 0, 13000000, 0, 43829761, 0, 12588937, 950532, 62486933, 16410548, 17266048, 0, 0, 0, 0, 0, 0, 0, 5093465, 0, 54756326, 11279923, 355312, 0, 0, 0, 0, 0, 0]

Checking the number of elements in the 'foreign_rest' list.

In [279]:
len(foreign_rest)
Out[279]:
31

This is the Profit of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.

In [280]:
profit_rest = []
for i,x in enumerate(worldwide_rest):
    profit_rest.append(x-demo_rest.budget[i])
print(profit_rest) #showing the profit_rest list 
[4856268.0, 8404.0, 257845.0, 659312.0, 18912216.0, -24649246.0, 89410061.0, -4503941.0, 121165.0, -1712236.0, 52091915.0, 13912841.0, 15465835.0, -35251281.0, 1711143.0, -7574772.0, 11587135.0, -9278757.0, -4359700.0, -6984551.0, 58693537.0, 48766923.0, 68711836.0, 14718173.0, 1851683.0, -25363204.0, -4052424.0, -12828711.0, 556082.0, 1500000.0, 2000000.0]

Checking the number of elements in the 'profit_rest' list.

In [281]:
len(profit_rest)
Out[281]:
31

This is the Profit of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.

In [282]:
no_tickets_rest = []
for i in worldwide_rest:
    no_tickets_rest.append(round(i/10))
print(no_tickets_rest) #showing the no_tickets_rest list 
[1735627, 100840, 27784, 161478, 2041222, 2035075, 9841006, 49606, 1512116, 102215, 6709192, 2041284, 1946584, 19549, 241114, 102523, 1858714, 872124, 4030, 1001545, 8069354, 5476692, 7721184, 3471817, 195168, 63680, 244758, 917129, 325608, 1300000, 1100000]

Checking the number of elements in the 'no_tickets_rest' list.

In [283]:
len(no_tickets_rest)
Out[283]:
31

Creating the Worldwide_Gross_x column by turning the worldwide gross list (worldwide_rest) into currency. To be added to the demo_rest dataframe.

In [284]:
worldwide_gross_restx = []
for i in worldwide_rest:
    worldwide_gross_restx.append("${:,.0f}".format(i))
print(worldwide_gross_restx) #showing the worldwide_gross_restx list 
['$17,356,268', '$1,008,404', '$277,845', '$1,614,784', '$20,412,216', '$20,350,754', '$98,410,061', '$496,059', '$15,121,165', '$1,022,148', '$67,091,915', '$20,412,841', '$19,465,835', '$195,494', '$2,411,143', '$1,025,228', '$18,587,135', '$8,721,243', '$40,300', '$10,015,449', '$80,693,537', '$54,766,923', '$77,211,836', '$34,718,173', '$1,951,683', '$636,796', '$2,447,576', '$9,171,289', '$3,256,082', '$13,000,000', '$11,000,000']

Checking the number of elements in the 'worldwide_gross_restx' list.

In [127]:
len(worldwide_gross_restx)
Out[127]:
31

Creating the Domestic_Gross_x column by turning the domestic gross list (domestic_rest) into currency. To be added to the demo_rest dataframe.

In [285]:
domestic_gross_restx = []
for i in domestic_rest:
    domestic_gross_restx.append("${:,.0f}".format(i))
print(domestic_gross_restx) #showing the domestic_gross_restx list 
['$12,594,698', '$0', '$0', '$0', '$7,412,216', '$0', '$54,580,300', '$0', '$2,532,228', '$71,616', '$4,604,982', '$4,002,293', '$2,199,787', '$0', '$0', '$0', '$0', '$0', '$0', '$0', '$75,600,072', '$0', '$22,455,510', '$23,438,250', '$1,596,371', '$0', '$0', '$0', '$0', '$0', '$0']

Checking the number of elements in the 'domestic_gross_restx' list.

In [286]:
len(domestic_gross_restx)
Out[286]:
31

Creating the Foreign_Gross_x column by turning the foreign gross list (foreign_rest) into currency. To be added to the demo_rest dataframe.

In [287]:
foreign_gross_restx = []
for i in foreign_rest:
    foreign_gross_restx.append("${:,.0f}".format(i))
print(foreign_gross_restx) #showing the foreign_gross_restx list 
['$4,761,570', '$0', '$0', '$0', '$13,000,000', '$0', '$43,829,761', '$0', '$12,588,937', '$950,532', '$62,486,933', '$16,410,548', '$17,266,048', '$0', '$0', '$0', '$0', '$0', '$0', '$0', '$5,093,465', '$0', '$54,756,326', '$11,279,923', '$355,312', '$0', '$0', '$0', '$0', '$0', '$0']

Checking the number of elements in the 'foreign_gross_restx' list.

In [129]:
len(foreign_gross_restx)
Out[129]:
31

Creating the Profit_x column by turning the profit list (profit_rest) into currency. To be added to the demo_rest dataframe.

In [288]:
profit_restx = []
for i in profit_rest:
    profit_restx.append("${:,.0f}".format(i))
print(profit_restx) #showing the profit_restx list 
['$4,856,268', '$8,404', '$257,845', '$659,312', '$18,912,216', '$-24,649,246', '$89,410,061', '$-4,503,941', '$121,165', '$-1,712,236', '$52,091,915', '$13,912,841', '$15,465,835', '$-35,251,281', '$1,711,143', '$-7,574,772', '$11,587,135', '$-9,278,757', '$-4,359,700', '$-6,984,551', '$58,693,537', '$48,766,923', '$68,711,836', '$14,718,173', '$1,851,683', '$-25,363,204', '$-4,052,424', '$-12,828,711', '$556,082', '$1,500,000', '$2,000,000']

Checking the number of elements in the 'profit_restx' list.

In [130]:
len(profit_restx)
Out[130]:
31

Creating the Tickets_x column by turning the ticket list (no_tickets_rest) into currency. To be added to the demo_rest dataframe.

In [289]:
str_tickets_restx = []
for i in no_tickets_rest:
    str_tickets_restx.append("{:,.0f}".format(i))
print(str_tickets_restx) #showing the str_tickets_restx list 
['1,735,627', '100,840', '27,784', '161,478', '2,041,222', '2,035,075', '9,841,006', '49,606', '1,512,116', '102,215', '6,709,192', '2,041,284', '1,946,584', '19,549', '241,114', '102,523', '1,858,714', '872,124', '4,030', '1,001,545', '8,069,354', '5,476,692', '7,721,184', '3,471,817', '195,168', '63,680', '244,758', '917,129', '325,608', '1,300,000', '1,100,000']

Checking the number of elements in the 'str_tickets_restx' list.

In [131]:
len(str_tickets_restx)
Out[131]:
31

Creating the Production_Budget_x column by turning the demo_rest budget coulmn into currency. To be added to the demo_rest dataframe.

In [290]:
str_budget_restx = []
for i in demo_rest.budget:
    str_budget_restx.append("${:,.0f}".format(i))
print(str_budget_restx) #showing the str_budget_restx list 
['$12,500,000', '$1,000,000', '$20,000', '$955,472', '$1,500,000', '$45,000,000', '$9,000,000', '$5,000,000', '$15,000,000', '$2,734,384', '$15,000,000', '$6,500,000', '$4,000,000', '$35,446,775', '$700,000', '$8,600,000', '$7,000,000', '$18,000,000', '$4,400,000', '$17,000,000', '$22,000,000', '$6,000,000', '$8,500,000', '$20,000,000', '$100,000', '$26,000,000', '$6,500,000', '$22,000,000', '$2,700,000', '$11,500,000', '$9,000,000']

Checking the number of elements in the 'str_budget_restx' list.

In [291]:
len(str_budget_restx)
Out[291]:
31

Deleting four coulmns form the demo_rest dataframe to align with the demo_drama dataframe, to mkae sure it is suitable for merging, to complete the dataframe for this analysis .

In [292]:
demo_rest = demo_rest.drop(['year',  'votes', 'country', 'gross'], axis=1)

After creating all the final columns, they are then added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis.

In [293]:
demo_rest['Worldwide_Gross'] = worldwide_rest
demo_rest["Foreign_Gross"] = foreign_rest
demo_rest['Domestic_Gross'] = domestic_rest
demo_rest["Profit"] = profit_rest
demo_rest['Tickets'] = no_tickets_rest

demo_rest['Worldwide_Gross_x'] = worldwide_gross_restx
demo_rest["Foreign_Gross_x"] = foreign_gross_restx
demo_rest['Domestic_Gross_x'] = domestic_gross_restx
demo_rest["Profit_x"] = profit_restx
demo_rest['Tickets_x'] = str_tickets_restx
demo_rest['Production_Budget_x'] = str_budget_restx

Rearranging the columns in demo_rest dataframe to align with demo_drama dataframe to be suitable for appending, creating the final dataframe Drama_DataFrame for this analysis

In [294]:
demo_rest = demo_rest[['movie','released','genre','rating','budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
           'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','runtime','score',
           'company','star','director','writer']]

Checking the dataframe and getting the first five rows of the dataframe

In [295]:
demo_rest.head()
Out[295]:
movie released genre rating budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x runtime score company star director writer
0 Matador March 7, 1986 (Spain) Drama NC-17 12500000.0 $12,500,000 12594698 $12,594,698 4761570 $4,761,570 ... 4856268.0 $4,856,268 1735627 1,735,627 110.0 7.0 Compañía Iberoamericana de TV Assumpta Serna Pedro Almodóvar Pedro Almodóvar
1 Whore October 18, 1991 (United States) Drama NC-17 1000000.0 $1,000,000 0 $0 0 $0 ... 8404.0 $8,404 100840 100,840 85.0 5.6 Cheap Date Theresa Russell Ken Russell David Hines
2 Tokyo Decadence April 30, 1993 (United States) Drama NC-17 20000.0 $20,000 0 $0 0 $0 ... 257845.0 $257,845 27784 27,784 112.0 6.0 Cinemabrain Miho Nikaido Ryû Murakami Ryû Murakami
3 Wide Sargasso Sea April 16, 1993 (United States) Drama NC-17 955472.0 $955,472 0 $0 0 $0 ... 659312.0 $659,312 161478 161,478 98.0 5.7 Laughing Kookaburra Productions Karina Lombard John Duigan Jan Sharp
4 Kids September 1, 1995 (United States) Drama NC-17 1500000.0 $1,500,000 7412216 $7,412,216 13000000 $13,000,000 ... 18912216.0 $18,912,216 2041222 2,041,222 91.0 7.1 Guys Upstairs Leo Fitzpatrick Larry Clark Harmony Korine

5 rows × 22 columns

Renaming the columns in the demo_rest dataframe to align with demo_drama dataframe to be suitable for appending, creating the final dataframe Drama_DataFrame for this analysis

In [296]:
demo_rest.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
           'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
            'Company','Star','Director','Writer']

Checking the first five rows of demo_rest dataframe to make sure it aligns with demo_drama dataframe before appending both dataframes to eachother to create the final dataframes that will be used for this analysis Drama_DataFrame.

In [297]:
demo_rest.head()
Out[297]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Matador March 7, 1986 (Spain) Drama NC-17 12500000.0 $12,500,000 12594698 $12,594,698 4761570 $4,761,570 ... 4856268.0 $4,856,268 1735627 1,735,627 110.0 7.0 Compañía Iberoamericana de TV Assumpta Serna Pedro Almodóvar Pedro Almodóvar
1 Whore October 18, 1991 (United States) Drama NC-17 1000000.0 $1,000,000 0 $0 0 $0 ... 8404.0 $8,404 100840 100,840 85.0 5.6 Cheap Date Theresa Russell Ken Russell David Hines
2 Tokyo Decadence April 30, 1993 (United States) Drama NC-17 20000.0 $20,000 0 $0 0 $0 ... 257845.0 $257,845 27784 27,784 112.0 6.0 Cinemabrain Miho Nikaido Ryû Murakami Ryû Murakami
3 Wide Sargasso Sea April 16, 1993 (United States) Drama NC-17 955472.0 $955,472 0 $0 0 $0 ... 659312.0 $659,312 161478 161,478 98.0 5.7 Laughing Kookaburra Productions Karina Lombard John Duigan Jan Sharp
4 Kids September 1, 1995 (United States) Drama NC-17 1500000.0 $1,500,000 7412216 $7,412,216 13000000 $13,000,000 ... 18912216.0 $18,912,216 2041222 2,041,222 91.0 7.1 Guys Upstairs Leo Fitzpatrick Larry Clark Harmony Korine

5 rows × 22 columns

Checking the first five rows of demo_drama dataframe to make sure it aligns with demo_rest dataframe before appending both dataframes to eachother to create the final dataframes that will be used for this analysis Drama_DataFrame.

In [298]:
demo_drama.head()
Out[298]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 ... 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 ... -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 ... 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 ... 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 ... 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford

5 rows × 22 columns

Appending demo_rest dataframe to demo_drama dataframe to eachother to create the final dataframe Drama_DataFrame that will be used for this analysis.

In [299]:
Drama_DataFrame = demo_drama.append(demo_rest, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\3808589794.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  Drama_DataFrame = demo_drama.append(demo_rest, ignore_index=True)

Rearranging the columns in Drama_DataFrame.

In [300]:
Drama_DataFrame = Drama_DataFrame[['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross_x',
           'Worldwide_Gross','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
            'Company','Star','Director','Writer']]

Renaming the columns in the Drama_DataFrame.

In [301]:
Drama_DataFrame.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
           'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
           'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
            'Company','Star','Director','Writer']

The new 'Drama_DataFrame' dataframe.

In [304]:
Drama_DataFrame.head()
Out[304]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 ... 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 ... -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 ... 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 ... 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 ... 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford

5 rows × 22 columns

The 'G' genre does not have enough movies to be analyzed, more 'G' rated movies will be added to the Drama_dataframe dataframw for appropriate analysis.

These are the names of the new 'G-rated' movies that will be stored in the 'g_name' list, for the 'Movie' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [305]:
g_name  = ['Beauty and the Beast 1991' , 'The Little Rascals', 'Ramona and Beezus', 
           'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna',
           'Babe: Pig in the City', 'Lassie Come Home', 'Charlotte\'s Web', 'A Little Princess',
           'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music',
           'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', 
           'Before the Wrath', 'Hachiko: A Dog\'s Story', 'Giant', 'The Ten Commandments 1966',
           'The Quiet Man', 'Three Cions in the Fountain', 'Miracle of Marcelino']

The 'g_name' list.

In [89]:
print(g_name)
['Beauty and the Beast 1991', 'The Little Rascals', 'Ramona and Beezus', 'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna', 'Babe: Pig in the City', 'Lassie Come Home', "Charlotte's Web", 'A Little Princess', 'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music', 'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', 'Before the Wrath', "Hachiko: A Dog's Story", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Three Cions in the Fountain', 'Miracle of Marcelino']

Checking the number of elements in the 'g_name' list.

In [142]:
len(g_name)
Out[142]:
26

These are the writers of the new 'G-rated' movies that will be stored in the 'g_writer' list, for the 'Writer' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [306]:
g_writer = ['Linda Woolverton','Penelope Spheeris','Beverly Cleary','Melissa Mathison','Victor Hugo',
            'Chris Noonan','Eleanor Hodgman Porter','Judy Morris','Eric Knight','E. B. White',
            'Frances Hodgson Burnett','Valerie Tripp','Mike Rich','Caroline Thompson','Ernest Lehman',
            'Kate DiCamillo','Jonathan Roberts','Perce Pearce','Alan Jay Lerner','Brent Miller Jr.',
            'Stephen P. Lindsey','Edna Ferber',' Fredric M. Frank','Frank S. Nugent','John Patrick',
            'José María Sánchez-Silva']

The 'g_writer' list.

In [93]:
print(g_writer)
['Linda Woolverton', 'Penelope Spheeris', 'Beverly Cleary', 'Melissa Mathison', 'Victor Hugo', 'Chris Noonan', 'Eleanor Hodgman Porter', 'Judy Morris', 'Eric Knight', 'E. B. White', 'Frances Hodgson Burnett', 'Valerie Tripp', 'Mike Rich', 'Caroline Thompson', 'Ernest Lehman', 'Kate DiCamillo', 'Jonathan Roberts', 'Perce Pearce', 'Alan Jay Lerner', 'Brent Miller Jr.', 'Stephen P. Lindsey', 'Edna Ferber', ' Fredric M. Frank', 'Frank S. Nugent', 'John Patrick', 'José María Sánchez-Silva']

Checking the number of elements in the 'g_writer' list.

In [143]:
len(g_writer)
Out[143]:
26

These are the release date of the new 'G-rated' movies that will be stored in the 'g_date' list, for the 'Release_Date' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [307]:
g_date = ['Novemeber 22, 1991', 'August 5, 1994', 'July 23, 2010', 'October 17, 1979', 'June 21, 1996',
         'August 4, 1995', 'May 19, 1960', 'November 25, 1998', 'December 1943', 'October 15, 1952',
         'May 10, 1995', 'July 2, 2008', 'March 29, 2002', 'April 13 1993', 'April 1, 1993',
         'December 19, 2008', 'June 24, 1994', 'August 21, 1942', 'December 25, 1964', 'March 3, 2020',
         'March 12, 2010', 'Novemebr 24, 1952', 'October 5, 1956', 'Septemebr 14, 1954', 'May 15, 1954',
         'October 22, 1956']

The 'g_name' list.

In [96]:
print(g_date)
['Novemeber 22, 1991', 'August 5, 1994', 'July 23, 2010', 'October 17, 1979', 'June 21, 1996', 'August 4, 1995', 'May 19, 1960', 'November 25, 1998', 'December 1943', 'October 15, 1952', 'May 10, 1995', 'July 2, 2008', 'March 29, 2002', 'April 13 1993', 'April 1, 1993', 'December 19, 2008', 'June 24, 1994', 'August 21, 1942', 'December 25, 1964', 'March 3, 2020', 'March 12, 2010', 'Novemebr 24, 1952', 'October 5, 1956', 'Septemebr 14, 1954', 'May 15, 1954', 'October 22, 1956']

Checking the number of elements in the 'g_date' list.

In [144]:
len(g_date)
Out[144]:
26

These are the production budget of the new 'G-rated' movies that will be stored in the 'g_budget' list, for the 'Production_Budget' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [308]:
g_budget = [20000000, 23000000, 15000000, 2700000, 70000000, 30000000, 2500000, 90000000, 666000, 85000000,
           17000000, 10000000, 22000000, 18000000, 8200000, 60000000, 45000000, 858000, 17000000, 300000, 
           10000000, 6400000, 13000000, 1750000, 1700000, 3000000]

The 'g_budget' list.

In [98]:
print(g_budget)
[20000000, 23000000, 15000000, 2700000, 70000000, 30000000, 2500000, 90000000, 666000, 85000000, 17000000, 10000000, 22000000, 18000000, 8200000, 60000000, 45000000, 858000, 17000000, 300000, 10000000, 6400000, 13000000, 1750000, 1700000, 3000000]

Checking the number of elements in the 'g_budget' list.

In [145]:
len(g_budget)
Out[145]:
26

These are the domestic gross that will be stored in the 'g_domestic' list, of the new 'G-rated' movies for the 'Domestic_Gross' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [309]:
g_domestic =[206333165, 51764950, 25167002, 0, 100138851, 63658910, 0, 18319860, 0, 82985708, 0, 0, 
            75600072, 0, 163214286, 50877145, 421785283, 102797000, 72000000, 0, 0, 30176619, 0, 7600000,
            5000000, 0]

The 'g_domestic' list.

In [101]:
print(g_domestic)
[206333165, 51764950, 25167002, 0, 100138851, 63658910, 0, 18319860, 0, 82985708, 0, 0, 75600072, 0, 163214286, 50877145, 421785283, 102797000, 72000000, 0, 0, 30176619, 0, 7600000, 5000000, 0]

Checking the number of elements in the 'g_domestic' list.

In [146]:
len(g_domestic)
Out[146]:
26

These are the foreign gross of the new 'G-rated' movies that will be stored in the 'g_foreign' list, for the 'Foreign_Gross' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [310]:
g_foreign = [232323678, 15183000, 1302619, 0, 225361149, 182441090, 0, 50812000, 0, 61000000, 0,
                  0, 4891444, 0, 122999909, 39605172, 564429585, 165203000, 71636, 0, 0, 15790, 0, 377,
                  7000000, 0]

The 'g_foreign' list.

In [104]:
print(g_foreign)
[232323678, 15183000, 1302619, 0, 225361149, 182441090, 0, 50812000, 0, 61000000, 0, 0, 4891444, 0, 122999909, 39605172, 564429585, 165203000, 71636, 0, 0, 15790, 0, 377, 7000000, 0]

Checking the number of elements in the 'g_foreign' list.

In [147]:
len(g_foreign)
Out[147]:
26

These are the worldwide gross of the new 'G-rated' movies for the 'Worldwide_Gross' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [311]:
g_worldwide = [438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 69131860,
               4517000, 143985708, 10015449, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868,
              268000000, 72071636, 108998, 47707417, 30194409, 65500000, 7600377, 12000000, 592861]

The 'g_worldwide' list.

In [313]:
print(g_worldwide)
[438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 69131860, 4517000, 143985708, 10015449, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868, 268000000, 72071636, 108998, 47707417, 30194409, 65500000, 7600377, 12000000, 592861]

Checking the number of elements in the 'g_worldwide' list.

In [148]:
len(g_worldwide)
Out[148]:
26

These are the runtime of the new 'G-rated' movies for the 'Runtime' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [315]:
g_runtime = [84, 83, 103, 118, 91, 91, 134, 90, 89, 97, 97, 105, 128, 101, 174, 87, 87, 70, 175, 84, 90,
            200, 220, 129, 102, 91]

The 'g_runtime' list.

In [316]:
print(g_runtime)
[84, 83, 103, 118, 91, 91, 134, 90, 89, 97, 97, 105, 128, 101, 174, 87, 87, 70, 175, 84, 90, 200, 220, 129, 102, 91]

Checking the number of elements in the 'g_runtime' list.

In [149]:
len(g_runtime)
Out[149]:
26

These are the rating of the new 'G-rated' movies for the 'Averagerating' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [318]:
g_rating = [8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8,
           6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]

The 'g_rating' list.

In [319]:
print(g_rating)
[8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8, 6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]

Checking the number of elements in the 'g_rating' list.

In [150]:
len(g_rating)
Out[150]:
26

These are the names of the production company of the new 'G-rated' movies for the 'Company' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [321]:
g_company = ['The Walt Disney Company','Universal Pictures', 'Fox 2000 Pictures', 'Omni Zoetrope', 
             'The Walt Disney Company', 'Universal Pictures', 'The Walt Disney Company', 
             'Universal Pictures', 'Metro-Goldwyn-Mayer', 'Hanna-Barber-Productions', 
             'Warner Bros. Pictures', 'Picturehouse', 'The Walt Disney Company', 'American Zoetrope',
             '20th Century Studios', 'Universal Pictures', 'The Walt Disney Company', 
             'The Walt Disney Company', 'Warner Bros. Pictures', 'NaN', 'Inferno Distribution',
             'Warner Bros. Pictures', 'Motion Picture Associates', 'Republic Pictures', 
             '20th Century Studios', 'United Motion Pictures']

The 'g_company' list.

In [322]:
print(g_company)
['The Walt Disney Company', 'Universal Pictures', 'Fox 2000 Pictures', 'Omni Zoetrope', 'The Walt Disney Company', 'Universal Pictures', 'The Walt Disney Company', 'Universal Pictures', 'Metro-Goldwyn-Mayer', 'Hanna-Barber-Productions', 'Warner Bros. Pictures', 'Picturehouse', 'The Walt Disney Company', 'American Zoetrope', '20th Century Studios', 'Universal Pictures', 'The Walt Disney Company', 'The Walt Disney Company', 'Warner Bros. Pictures', 'NaN', 'Inferno Distribution', 'Warner Bros. Pictures', 'Motion Picture Associates', 'Republic Pictures', '20th Century Studios', 'United Motion Pictures']

Checking the number of elements in the 'g_company' list.

In [151]:
len(g_company)
Out[151]:
26

These are the names of the directors of the new 'G-rated' movies for the 'Director' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [324]:
g_director = ['Gary Trousdale', 'Penelope Spheeris', 'Elizabeth Allen Ressenbaum', 'Carral Ballard', 
             'Gary Trouside', 'Chris Noonan', 'David Swift', 'George Miller', 'Fred M. Wilcox',
             'Gary Winick', 'Alfonso Cuaron', 'Patricia Rozema', 'John Lee Hancock', 'Agnieszka Holland', 
             'Robert Wise', 'Sam Fell', 'Rob Minkoff', 'John Hubley', 'Cecil Beaton', 'Brent Miller',
             'Lasse Hallstrom', 'George Stevens', 'Cecil B. Demille', 'John Ford', 'Jean Negulesco',
             'Ladislao Vajda']

The 'g_director' list.

In [325]:
print(g_director)
['Gary Trousdale', 'Penelope Spheeris', 'Elizabeth Allen Ressenbaum', 'Carral Ballard', 'Gary Trouside', 'Chris Noonan', 'David Swift', 'George Miller', 'Fred M. Wilcox', 'Gary Winick', 'Alfonso Cuaron', 'Patricia Rozema', 'John Lee Hancock', 'Agnieszka Holland', 'Robert Wise', 'Sam Fell', 'Rob Minkoff', 'John Hubley', 'Cecil Beaton', 'Brent Miller', 'Lasse Hallstrom', 'George Stevens', 'Cecil B. Demille', 'John Ford', 'Jean Negulesco', 'Ladislao Vajda']

Checking the number of elements in the 'g_director' list.

In [152]:
len(g_director)
Out[152]:
26

These are the names of the starring actors of the new 'G-rated' movies for the 'Star' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [326]:
g_star  = ['Paige O\'Hara', 'Brittany Ashton', 'Joey King', 'Kelly Reno', 'Demi Moore', 'James Cromwell', 
           'Hayley Mills', 'James Cromwell', 'Roddy McDowall', 'Dokota Fanning', 'Liesel Matthews',
          'Abigail Breslin', 'Dennis Quaid', 'Kate Maberly', 'Julie Andrews', 'Matthew Brodewick', 
          'James Earl Jones', 'Donnie Dunagan', 'Audley Hepburn', 'Gemma Rizzuto', 'Richard Gere',
          'James Dean', 'Yul Brynner', 'John Wayne', 'Jean Peters', 'Pablito Calvo']
len(g_star)
Out[326]:
26

The 'g_rating' list.

In [327]:
print(g_star)
["Paige O'Hara", 'Brittany Ashton', 'Joey King', 'Kelly Reno', 'Demi Moore', 'James Cromwell', 'Hayley Mills', 'James Cromwell', 'Roddy McDowall', 'Dokota Fanning', 'Liesel Matthews', 'Abigail Breslin', 'Dennis Quaid', 'Kate Maberly', 'Julie Andrews', 'Matthew Brodewick', 'James Earl Jones', 'Donnie Dunagan', 'Audley Hepburn', 'Gemma Rizzuto', 'Richard Gere', 'James Dean', 'Yul Brynner', 'John Wayne', 'Jean Peters', 'Pablito Calvo']

This is for the 'Genre' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [328]:
g_genre = []
for i in range(26):g_genre.append('Drama')
print(g_genre) #showing the g_genre list 
['Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama']

Checking the number of elements in the 'g_genre' list.

In [329]:
len(g_genre)
Out[329]:
26

This is for the 'Rating' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [330]:
g_rated = []
for i in range(26):g_rated.append('G')
print(g_rated) #showing the g_rated list 
['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G']

Checking the number of elements in the 'g_rated' list.

In [155]:
len(g_rated)
Out[155]:
26

These are the Profit of the new 'G-rated' movies for the 'Profit' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.

In [331]:
g_profit = []
for x,y in enumerate(g_worldwide):
    g_profit.append(y-g_budget[x])
print(g_profit) #showing the g_profit list 
[418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 1250000, -20868140, 3851000, 58985708, -6984551, 7657973, 58491516, 293281000, 278014195, 30482317, 941214868, 267142000, 55071636, -191002, 37707417, 23794409, 52500000, 5850377, 10300000, -2407139]

Checking the number of elements in the 'g_profit' list.

In [332]:
len(g_profit)
Out[332]:
26

These are the number of Tickets sold of the new 'G-rated' movies for the 'Tickets' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [333]:
g_tickets = []
for i in g_worldwide:
    g_tickets.append(round(i/10))
print(g_tickets) #showing the g_tickets list 
[43865684, 6694795, 2746962, 3779964, 32550000, 24610000, 375000, 6913186, 451700, 14398571, 1001545, 1765797, 8049152, 31128100, 28621420, 9048232, 98621487, 26800000, 7207164, 10900, 4770742, 3019441, 6550000, 760038, 1200000, 59286]

Checking the number of elements in the 'g_director' list.

In [334]:
len(g_tickets)
Out[334]:
26

Creating the Production_Budget_x column by turning the g_budget list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [335]:
g_budget_x = []
for i in g_budget:
    g_budget_x.append("${:,.0f}".format(i))
print(g_budget_x) #showing the g_budget_x list 
['$20,000,000', '$23,000,000', '$15,000,000', '$2,700,000', '$70,000,000', '$30,000,000', '$2,500,000', '$90,000,000', '$666,000', '$85,000,000', '$17,000,000', '$10,000,000', '$22,000,000', '$18,000,000', '$8,200,000', '$60,000,000', '$45,000,000', '$858,000', '$17,000,000', '$300,000', '$10,000,000', '$6,400,000', '$13,000,000', '$1,750,000', '$1,700,000', '$3,000,000']

Checking the number of elements in the 'g_budget_x' list.

In [158]:
len(g_budget_x)
Out[158]:
26

Creating the Domestic_Gross_x column by turning the g_domestic list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [337]:
g_domestic_x = []
for i in g_domestic:
    g_domestic_x.append("${:,.0f}".format(i))
print(g_domestic_x) #showing the g_domestic_x list 
['$206,333,165', '$51,764,950', '$25,167,002', '$0', '$100,138,851', '$63,658,910', '$0', '$18,319,860', '$0', '$82,985,708', '$0', '$0', '$75,600,072', '$0', '$163,214,286', '$50,877,145', '$421,785,283', '$102,797,000', '$72,000,000', '$0', '$0', '$30,176,619', '$0', '$7,600,000', '$5,000,000', '$0']

Checking the number of elements in the 'g_director' list.

In [338]:
len(g_domestic_x)
Out[338]:
26

Creating the Foreign_Gross_x column by turning the g_foreign list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [339]:
g_foreign_x = []
for i in g_foreign:
    g_foreign_x.append("${:,.0f}".format(i))
print(g_foreign_x) #showing the g_foreign_x list 
['$232,323,678', '$15,183,000', '$1,302,619', '$0', '$225,361,149', '$182,441,090', '$0', '$50,812,000', '$0', '$61,000,000', '$0', '$0', '$4,891,444', '$0', '$122,999,909', '$39,605,172', '$564,429,585', '$165,203,000', '$71,636', '$0', '$0', '$15,790', '$0', '$377', '$7,000,000', '$0']

Checking the number of elements in the 'g_foreign_x' list.

In [340]:
len(g_foreign_x)
Out[340]:
26

Creating the Worldwide_Gross_x column by turning the g_worldwide list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [341]:
g_worldwide_x = []
for i in g_worldwide:
    g_worldwide_x.append("${:,.0f}".format(i))
print(g_worldwide_x) #showing the g_worldwide_x list 
['$438,656,843', '$66,947,950', '$27,469,621', '$37,799,643', '$325,500,000', '$246,100,000', '$3,750,000', '$69,131,860', '$4,517,000', '$143,985,708', '$10,015,449', '$17,657,973', '$80,491,516', '$311,281,000', '$286,214,195', '$90,482,317', '$986,214,868', '$268,000,000', '$72,071,636', '$108,998', '$47,707,417', '$30,194,409', '$65,500,000', '$7,600,377', '$12,000,000', '$592,861']

Checking the number of elements in the 'g_worldwide_x' list.

In [161]:
len(g_worldwide_x)
Out[161]:
26

Creating the Profit_x column by turning the g_profit list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [342]:
g_profit_x = []
for i in g_profit:
    g_profit_x.append("${:,.0f}".format(i))
print(g_profit_x) #showing the g_profit_x list 
['$418,656,843', '$43,947,950', '$12,469,621', '$35,099,643', '$255,500,000', '$216,100,000', '$1,250,000', '$-20,868,140', '$3,851,000', '$58,985,708', '$-6,984,551', '$7,657,973', '$58,491,516', '$293,281,000', '$278,014,195', '$30,482,317', '$941,214,868', '$267,142,000', '$55,071,636', '$-191,002', '$37,707,417', '$23,794,409', '$52,500,000', '$5,850,377', '$10,300,000', '$-2,407,139']

Checking the number of elements in the 'g_profit_x' list.

In [162]:
len(g_profit_x)
Out[162]:
26

Creating the Tickets_x column by turning the g_tickets list into string. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.

In [343]:
g_tickets_x = []
for i in g_tickets:
    g_tickets_x.append("{:,.0f}".format(i))
print(g_tickets_x) #showing the g_tickets_x list 
['43,865,684', '6,694,795', '2,746,962', '3,779,964', '32,550,000', '24,610,000', '375,000', '6,913,186', '451,700', '14,398,571', '1,001,545', '1,765,797', '8,049,152', '31,128,100', '28,621,420', '9,048,232', '98,621,487', '26,800,000', '7,207,164', '10,900', '4,770,742', '3,019,441', '6,550,000', '760,038', '1,200,000', '59,286']

Checking the number of elements in the 'g_tickets_x' list.

In [344]:
len(g_tickets_x)
Out[344]:
26

Creating the g_dataframe dataframe.

In [345]:
g_dataframe =  pd.DataFrame({"Movie":g_name, "Release_Date":g_date, "Genre":g_genre, "Rating":g_rated, 
                            "Production_Budget":g_budget, "Production_Budget_x":g_budget_x, 
                            "Domestic_Gross":g_domestic, "Domestic_Gross_x":g_domestic_x, 
                            "Foreign_Gross":g_foreign, "Foreign_Gross_x":g_foreign_x, 
                            "Worldwide_Gross":g_worldwide, "Worldwide_Gross_x":g_worldwide_x, 
                            "Profit":g_profit, "Profit_x":g_profit_x, "Tickets":g_tickets, 
                            "Tickets_x":g_tickets_x, "Runtime":g_runtime, "Averagerating":g_rating, 
                            "Company":g_company, "Star":g_star, "Director":g_director, "Writer":g_writer
                             })

The first five columns of the g_dataframe dataframe.

In [346]:
g_dataframe.head()
Out[346]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Beauty and the Beast 1991 Novemeber 22, 1991 Drama G 20000000 $20,000,000 206333165 $206,333,165 232323678 $232,323,678 ... 418656843 $418,656,843 43865684 43,865,684 84 8.0 The Walt Disney Company Paige O'Hara Gary Trousdale Linda Woolverton
1 The Little Rascals August 5, 1994 Drama G 23000000 $23,000,000 51764950 $51,764,950 15183000 $15,183,000 ... 43947950 $43,947,950 6694795 6,694,795 83 6.3 Universal Pictures Brittany Ashton Penelope Spheeris Penelope Spheeris
2 Ramona and Beezus July 23, 2010 Drama G 15000000 $15,000,000 25167002 $25,167,002 1302619 $1,302,619 ... 12469621 $12,469,621 2746962 2,746,962 103 6.5 Fox 2000 Pictures Joey King Elizabeth Allen Ressenbaum Beverly Cleary
3 The Black Stallion October 17, 1979 Drama G 2700000 $2,700,000 0 $0 0 $0 ... 35099643 $35,099,643 3779964 3,779,964 118 7.4 Omni Zoetrope Kelly Reno Carral Ballard Melissa Mathison
4 The Hunchback of Notre Drame June 21, 1996 Drama G 70000000 $70,000,000 100138851 $100,138,851 225361149 $225,361,149 ... 255500000 $255,500,000 32550000 32,550,000 91 7.0 The Walt Disney Company Demi Moore Gary Trouside Victor Hugo

5 rows × 22 columns

Appending the g_dataframe dataframe to the Drama_DataFrame.

In [347]:
Drama_DataFrame = Drama_DataFrame.append(g_dataframe, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\3157750573.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  Drama_DataFrame = Drama_DataFrame.append(g_dataframe, ignore_index=True)

It has been noticed that the 'NC-17' genre does not have enough movies to be analyzed, more 'NC-17' rated movies will be added to the Drama_dataframe dataframw for appropriate analysis.

These are the names of the new 'NC-17 rated' movies for the 'Movie' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [9]:
nc17_name = ['Showgirls', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine',
            'Two Girls and a Guy', 'Elles', 'Hell', 'Killer Joe', 'Se, jie', 'Queen of Hearts', 
            'The Evil Dead', 'Man Bites Dog', 'Shame', 'Nymphomaniac: Vol. I', 'Arabian Nights',
            'Frontier(s)', 'Chained', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 
            'The Big Feast', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris',
            'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Orgazmo', 'A Dirty Shame', 
            'Young Adam', 'Whore 1991', 'Ma Mère', 'Law of Desire' ]
print(nc17_name) #showing the nc17_name list 
['Showgirls', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine', 'Two Girls and a Guy', 'Elles', 'Hell', 'Killer Joe', 'Se, jie', 'Queen of Hearts', 'The Evil Dead', 'Man Bites Dog', 'Shame', 'Nymphomaniac: Vol. I', 'Arabian Nights', 'Frontier(s)', 'Chained', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 'The Big Feast', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris', 'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Orgazmo', 'A Dirty Shame', 'Young Adam', 'Whore 1991', 'Ma Mère', 'Law of Desire']

Checking the number of elements in the 'nc17_name' list.

In [167]:
len(nc17_name)
Out[167]:
35

These are the names of the directors of the new 'NC-17 rated' movies for the 'Director' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [10]:
nc17_director = ['Paul Verhoeven', 'Bernardo Bertolucci', 'Steve McQueen', 'Abdellatif Kechiche', 
                 'Derek Cianfrance', 'James Toback', 'Małgośka Szumowska', 'Tim Fehlbaum', 
                 'William Friedkin','Ang Lee', 'May el-Toukhy', 'Sam Raimi', 'Rémy Belvaux', 
                 'Steve McQueen', 'Lars von Trier', 'Pier Paolo Pasolini', 'Xavier Gens',
                 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith','Abel Ferrara','John Gulager',
                 'Russ Meyer', 'Larry Clark', 'Paul Haggis', 'Bernardo Bertolucci','John Waters',
                 'Ang Lee', 'Todd Solondz',  'Trey Parker', 'John Waters', 'David Mackenzie',
                 'Ken Russell', 'Christophe Honoré', 'Pedro Almodóvar']
print(nc17_director) #showing the nc17_director list 
['Paul Verhoeven', 'Bernardo Bertolucci', 'Steve McQueen', 'Abdellatif Kechiche', 'Derek Cianfrance', 'James Toback', 'Małgośka Szumowska', 'Tim Fehlbaum', 'William Friedkin', 'Ang Lee', 'May el-Toukhy', 'Sam Raimi', 'Rémy Belvaux', 'Steve McQueen', 'Lars von Trier', 'Pier Paolo Pasolini', 'Xavier Gens', 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith', 'Abel Ferrara', 'John Gulager', 'Russ Meyer', 'Larry Clark', 'Paul Haggis', 'Bernardo Bertolucci', 'John Waters', 'Ang Lee', 'Todd Solondz', 'Trey Parker', 'John Waters', 'David Mackenzie', 'Ken Russell', 'Christophe Honoré', 'Pedro Almodóvar']

Checking the number of elements in the 'nc17_director' list.

In [168]:
len(nc17_director)
Out[168]:
35

These are the writers of the new 'NC-17 rated' movies for the 'Writer' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [11]:
nc17_writer = ['Joe Eszterhas', 'Gilbert Adair', 'Abi Morgan', 'Ghalia Lacroix', 'Cami Delavigne', 
               'James Toback', 'Tine Byrckel', 'Tim Fehlbaum', 'Tracy Letts', 'Hui-Ling Wang', 
               'Maren Louise Käehne', 'Sam Raimi', 'André Bonzel', 'Abi Morgan', 'Lars von Trier',
               'Dacia Maraini', 'Xavier Gens', 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith', 
               'Abel Ferrara', 'Patrick Melton', 'Roger Ebert', 'Harmony Korine', 'Paul Haggis', 
               'Franco Arcalli', 'John Waters', 'Hui-Ling Wang', 'Todd Solondz', 'Trey Parker', 
               'John Waters', '	David Mackenzie', 'Deborah Dalton', 'Christophe Honoré',
               'Pedro Almodóvar']
print(nc17_writer) #showing the nc17_writer list 
['Joe Eszterhas', 'Gilbert Adair', 'Abi Morgan', 'Ghalia Lacroix', 'Cami Delavigne', 'James Toback', 'Tine Byrckel', 'Tim Fehlbaum', 'Tracy Letts', 'Hui-Ling Wang', 'Maren Louise Käehne', 'Sam Raimi', 'André Bonzel', 'Abi Morgan', 'Lars von Trier', 'Dacia Maraini', 'Xavier Gens', 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith', 'Abel Ferrara', 'Patrick Melton', 'Roger Ebert', 'Harmony Korine', 'Paul Haggis', 'Franco Arcalli', 'John Waters', 'Hui-Ling Wang', 'Todd Solondz', 'Trey Parker', 'John Waters', '\tDavid Mackenzie', 'Deborah Dalton', 'Christophe Honoré', 'Pedro Almodóvar']

Checking the number of elements in the 'nc17_writer' list.

In [169]:
len(nc17_writer)
Out[169]:
35

These are the release date of the new 'NC-17 rated' movies for the 'Release_Date' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [12]:
nc17_date = ['September 22, 1995', 'February 6, 2004', 'December 2, 2011', 'October 25, 2013', 
             'December 29, 2010', 'September 9, 1997', 'April 27, 2012', '22 September 2011', 
             'July 27, 2012', 'September 28, 2007', 'November 1, 2019',  'October 15, 1981', 
             '15 January 1993', 'December 2, 2011', 'December 25, 2013',  'July 27, 1980', 'May 9, 2008',
             'August 5, 2012', 'August 26, 1994', 'October 19, 1994', 'November 20, 1992 ',
             'September 22, 2006', 'June 17, 1970', 'July 28, 1995', 'May 6, 2005', 'January 27, 1973',
             'March 17, 1972', 'September 28, 2007', 'October 16, 1998', 'October 23, 1998', 
             'September 24, 2004 ', 'April 16, 2004', 'October 4, 1991', 'May 13, 2005 ', 
             'April 3, 1987']
print(nc17_date) #showing the nc17_date list 
['September 22, 1995', 'February 6, 2004', 'December 2, 2011', 'October 25, 2013', 'December 29, 2010', 'September 9, 1997', 'April 27, 2012', '22 September 2011', 'July 27, 2012', 'September 28, 2007', 'November 1, 2019', 'October 15, 1981', '15 January 1993', 'December 2, 2011', 'December 25, 2013', 'July 27, 1980', 'May 9, 2008', 'August 5, 2012', 'August 26, 1994', 'October 19, 1994', 'November 20, 1992 ', 'September 22, 2006', 'June 17, 1970', 'July 28, 1995', 'May 6, 2005', 'January 27, 1973', 'March 17, 1972', 'September 28, 2007', 'October 16, 1998', 'October 23, 1998', 'September 24, 2004 ', 'April 16, 2004', 'October 4, 1991', 'May 13, 2005 ', 'April 3, 1987']

Checking the number of elements in the 'nc17_date' list.

In [170]:
len(nc17_date)
Out[170]:
35

These are the names of the starring actors of the new 'NC-17 rated' movies for the 'Star' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [13]:
nc17_star = ['Elizabeth Berkley', 'Eva Green', 'Michael Fassbender', 'Léa Seydoux', 'Ryan Gosling', 
             'Robert Downey Jr.', 'Juliette Binoche', 'Lisa Vicari', 'Juno Temple', 'Tony Leung Chiu-Wai', 
             'Trine Dyrholm',  'Bruce Campbell', 'Benoît Poelvoorde', 'Michael Fassbender',
             'Charlotte Gainsbourg', 'Ninetto Davoli', 'Karina Testa',  'Vincent D\'Onofrio', 
             'Woody Harrelson', 'Kevin Smith', 'Harvey Keitel', 'Clu Gulager', 'Marcia McBroom', 
             'Leo Fitzpatrick', 'Sandra Bullock', 'Marlon Brando', 'David Lochary', 
             'Tony Leung Chiu-Wai', 'Elizabeth Ashley', 'Michael Dean Jacobs', 'Suzanne Shepherd',
             'Tilda Swinton', 'Theresa Russell', 'Louis Garrel', 'Antonio Banderas']
print(nc17_star) #showing the nc17_star list 
['Elizabeth Berkley', 'Eva Green', 'Michael Fassbender', 'Léa Seydoux', 'Ryan Gosling', 'Robert Downey Jr.', 'Juliette Binoche', 'Lisa Vicari', 'Juno Temple', 'Tony Leung Chiu-Wai', 'Trine Dyrholm', 'Bruce Campbell', 'Benoît Poelvoorde', 'Michael Fassbender', 'Charlotte Gainsbourg', 'Ninetto Davoli', 'Karina Testa', "Vincent D'Onofrio", 'Woody Harrelson', 'Kevin Smith', 'Harvey Keitel', 'Clu Gulager', 'Marcia McBroom', 'Leo Fitzpatrick', 'Sandra Bullock', 'Marlon Brando', 'David Lochary', 'Tony Leung Chiu-Wai', 'Elizabeth Ashley', 'Michael Dean Jacobs', 'Suzanne Shepherd', 'Tilda Swinton', 'Theresa Russell', 'Louis Garrel', 'Antonio Banderas']

Checking the number of elements in the 'nc17_star' list.

In [356]:
len(nc17_star)
Out[356]:
35

These are the names of the production company of the new 'NC-17 rated' movies for the 'Company' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [20]:
nc17_company = ['Carolco Pictures', 'Recorded Picture Company', 'Film4', 'Wild Bunch', 'Hunting Lane Films',
               'Edward R. Pressman', 'Slot Machine', 'Caligari Film', 'Voltage Pictures', 
               'River Road Entertainment', 'Nordisk Film',  'Renaissance Pictures', 
               'Les Artistes Anonymes', 'Film4', 'Zentropa Entertainments', '	United Artists',
               'BR Films', 'Anchor Bay Entertainment', 'Regency Enterprises', 
               'View Askew Productions', 'Aries Films', 'LivePlanet', '20th Century Fox',
               'Independent Pictures', 'Bob Yari Productions', 'Produzioni Europee', 'Dreamland',
               'River Road Entertainment', 'Killer Films','Avenging Conscience', 'Killer Films',
               'Recorded Picture Company', 'Cheap Date', 'Gemini Films', 'El Deseo']
print(nc17_company) #showing the nc17_company list 
['Carolco Pictures', 'Recorded Picture Company', 'Film4', 'Wild Bunch', 'Hunting Lane Films', 'Edward R. Pressman', 'Slot Machine', 'Caligari Film', 'Voltage Pictures', 'River Road Entertainment', 'Nordisk Film', 'Renaissance Pictures', 'Les Artistes Anonymes', 'Film4', 'Zentropa Entertainments', '\tUnited Artists', 'BR Films', 'Anchor Bay Entertainment', 'Regency Enterprises', 'View Askew Productions', 'Aries Films', 'LivePlanet', '20th Century Fox', 'Independent Pictures', 'Bob Yari Productions', 'Produzioni Europee', 'Dreamland', 'River Road Entertainment', 'Killer Films', 'Avenging Conscience', 'Killer Films', 'Recorded Picture Company', 'Cheap Date', 'Gemini Films', 'El Deseo']

Checking the number of elements in the 'nc17_company' list.

In [172]:
len(nc17_company)
Out[172]:
35

These are the production budget of the new 'NC-17 rated' movies for the 'Production_Budget' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [19]:
nc17_budget = [45000000, 15000000, 6500000, 4074940, 1000000, 1000000, 3565572, 12000000, 10000000, 
              15000000, 19000000, 350000, 1000000, 6500000, 4700000, 904765, 3000000,
              700000, 34000000, 230000, 1000000, 3200000, 1000000, 1500000, 6500000,
              1250000, 12000, 15000000, 2200000, 1300000, 15000000, 6400000, 50000, 3259572,
              612072]
print(nc17_budget) #showing the nc17_budget list 
[45000000, 15000000, 6500000, 4074940, 1000000, 1000000, 3565572, 12000000, 10000000, 15000000, 19000000, 350000, 1000000, 6500000, 4700000, 904765, 3000000, 700000, 34000000, 230000, 1000000, 3200000, 1000000, 1500000, 6500000, 1250000, 12000, 15000000, 2200000, 1300000, 15000000, 6400000, 50000, 3259572, 612072]

Checking the number of elements in the 'nc17_budget' list.

In [359]:
len(nc17_budget)
Out[359]:
35

These are the domestic gross of the new 'NC-17 rated' movies for the 'Domestic_Gross' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [18]:
nc17_domestic = [20350754, 2531462, 4002293, 2199787, 9737892, 2057193, 157508, 169705587, 1291645,
                4604982, 0, 2400000, 0, 4002293, 785896, 0, 97182, 0, 50282766, 3073428, 2000022, 
                56131, 0, 7412216, 55334418, 36144824, 0, 4604982, 2746453, 582024, 1339668, 767373,
                0, 71616, 0 ]
print(nc17_domestic) #showing the nc17_domestic list 
[20350754, 2531462, 4002293, 2199787, 9737892, 2057193, 157508, 169705587, 1291645, 4604982, 0, 2400000, 0, 4002293, 785896, 0, 97182, 0, 50282766, 3073428, 2000022, 56131, 0, 7412216, 55334418, 36144824, 0, 4604982, 2746453, 582024, 1339668, 767373, 0, 71616, 0]

Checking the number of elements in the 'nc17_domestic' list.

In [174]:
len(nc17_domestic)
Out[174]:
35

These are the foreign gross of the new 'NC-17 rated' movies for the 'Foreign_Gross' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [17]:
nc17_foreign = [17400000, 12775651, 16410548, 17266048, 6828348, 257833, 3664733, 43414417, 3367465, 
               60562448, 0, 261944, 0, 16410548, 1308406, 0, 2686353, 0, 797, 820812, 38894, 634741, 0,
               13000000, 45838620, 2887, 0, 60562448, 3000000, 45263, 574498, 1794447, 0, 950532, 0]
print(nc17_foreign) #showing the nc17_foreign list 
[17400000, 12775651, 16410548, 17266048, 6828348, 257833, 3664733, 43414417, 3367465, 60562448, 0, 261944, 0, 16410548, 1308406, 0, 2686353, 0, 797, 820812, 38894, 634741, 0, 13000000, 45838620, 2887, 0, 60562448, 3000000, 45263, 574498, 1794447, 0, 950532, 0]

Checking the number of elements in the 'nc17_foreign' list.

In [365]:
len(nc17_foreign)
Out[365]:
35

These are the worldwide gross of the new 'NC-17 rated' movies for the 'Worldwide_Gross' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [16]:
nc17_worldwide = [37750754, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 4659110,
                 65167430, 1236844, 2661944, 205569, 20412841, 2094302, 3453416, 2783535, 103093, 
                 50283563, 3894240, 2038916, 690872, 9000000, 20412216, 101173038, 36147711	, 413802,
                 65167430, 5746453, 627287, 1914166, 2561820, 1008404, 1022148, 1470809]
print(nc17_worldwide) #showing the nc17_worldwide list 
[37750754, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 4659110, 65167430, 1236844, 2661944, 205569, 20412841, 2094302, 3453416, 2783535, 103093, 50283563, 3894240, 2038916, 690872, 9000000, 20412216, 101173038, 36147711, 413802, 65167430, 5746453, 627287, 1914166, 2561820, 1008404, 1022148, 1470809]

Checking the number of elements in the 'nc17_worldwide' list.

In [367]:
len(nc17_worldwide)
Out[367]:
35

These are the runtime of the new 'NC-17 rated' movies for the 'Runtime' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [15]:
nc17_runtime = [97, 130, 101, 180, 120, 84, 99, 89, 102, 158, 187, 85, 96, 101, 145, 155, 108, 94, 119,
               62, 96, 95, 109, 91, 112, 129, 92, 158, 134, 95, 84, 98, 80, 110, 82]
print(nc17_runtime) #showing the nc17_runtime list 
[97, 130, 101, 180, 120, 84, 99, 89, 102, 158, 187, 85, 96, 101, 145, 155, 108, 94, 119, 62, 96, 95, 109, 91, 112, 129, 92, 158, 134, 95, 84, 98, 80, 110, 82]

Checking the number of elements in the 'nc17_runtime' list.

In [369]:
len(nc17_runtime)
Out[369]:
35

These are the rating of the new 'NC-17 rated' movies for the 'Averagerating' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [14]:
nc17_rating = [4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4,
              7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]
print(nc17_rating) #showing the nc17_rating list 
[4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4, 7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]

Checking the number of elements in the 'nc17_rating' list.

In [178]:
len(nc17_rating)
Out[178]:
35

This is for the 'Genre' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [371]:
nc17_genre = []
for i in range(35):nc17_genre.append('Drama')
print(nc17_genre) #showing the nc17_genre list 
['Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama']

Checking the number of elements in the 'nc17_genre' list.

In [372]:
len(nc17_genre)
Out[372]:
35

This is for the 'Rating' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [373]:
nc17_rated = []
for i in range(35):nc17_rated.append('NC-17')
print(nc17_rated) #showing the nc17_rated list 
['NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17']

Checking the number of elements in the 'nc17_rated' list.

In [374]:
len(nc17_rated)
Out[374]:
35

Creating the Production_Budget_x column by turning the nc17_budget list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [375]:
nc17_budget_x = []
for i in nc17_budget:
    nc17_budget_x.append("${:,.0f}".format(i))
print(nc17_budget_x) #showing the nc17_budget_x list 
['$45,000,000', '$15,000,000', '$6,500,000', '$4,074,940', '$1,000,000', '$1,000,000', '$3,565,572', '$12,000,000', '$10,000,000', '$15,000,000', '$19,000,000', '$350,000', '$1,000,000', '$6,500,000', '$4,700,000', '$904,765', '$3,000,000', '$700,000', '$34,000,000', '$230,000', '$1,000,000', '$3,200,000', '$1,000,000', '$1,500,000', '$6,500,000', '$1,250,000', '$12,000', '$15,000,000', '$2,200,000', '$1,300,000', '$15,000,000', '$6,400,000', '$50,000', '$3,259,572', '$612,072']

Checking the number of elements in the 'nc17_budget_x' list.

In [181]:
len(nc17_budget_x)
Out[181]:
35

Creating the Domestic_Gross_x column by turning the nc17_domestic list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [376]:
nc17_domestic_x = []
for i in nc17_domestic:
    nc17_domestic_x.append("${:,.0f}".format(i))
print(nc17_domestic_x) #showing the nc17_domestic_x list 
['$20,350,754', '$2,531,462', '$4,002,293', '$2,199,787', '$9,737,892', '$2,057,193', '$157,508', '$169,705,587', '$1,291,645', '$4,604,982', '$0', '$2,400,000', '$0', '$4,002,293', '$785,896', '$0', '$97,182', '$0', '$50,282,766', '$3,073,428', '$2,000,022', '$56,131', '$0', '$7,412,216', '$55,334,418', '$36,144,824', '$0', '$4,604,982', '$2,746,453', '$582,024', '$1,339,668', '$767,373', '$0', '$71,616', '$0']

Checking the number of elements in the 'nc17_domestic_x' list.

In [182]:
len(nc17_domestic_x)
Out[182]:
35

Creating the Foreign_Gross_x column by turning the nc17_foreign list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [377]:
nc17_foreign_x = []
for i in nc17_foreign:
    nc17_foreign_x.append("${:,.0f}".format(i))
print(nc17_foreign_x) #showing the nc17_foreign_x list 
['$17,400,000', '$12,775,651', '$16,410,548', '$17,266,048', '$6,828,348', '$257,833', '$3,664,733', '$43,414,417', '$3,367,465', '$60,562,448', '$0', '$261,944', '$0', '$16,410,548', '$1,308,406', '$0', '$2,686,353', '$0', '$797', '$820,812', '$38,894', '$634,741', '$0', '$13,000,000', '$45,838,620', '$2,887', '$0', '$60,562,448', '$3,000,000', '$45,263', '$574,498', '$1,794,447', '$0', '$950,532', '$0']

Checking the number of elements in the 'nc17_foreign_x' list.

In [183]:
len(nc17_foreign_x)
Out[183]:
35

Creating the Worldwide_Gross_x column by turning the nc17_worldwide list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [378]:
nc17_worldwide_x = []
for i in nc17_worldwide:
    nc17_worldwide_x.append("${:,.0f}".format(i))
print(nc17_worldwide_x) #showing the nc17_worldwide_x list 
['$37,750,754', '$15,307,113', '$20,412,841', '$19,465,835', '$16,566,240', '$2,315,026', '$3,822,241', '$213,120,004', '$4,659,110', '$65,167,430', '$1,236,844', '$2,661,944', '$205,569', '$20,412,841', '$2,094,302', '$3,453,416', '$2,783,535', '$103,093', '$50,283,563', '$3,894,240', '$2,038,916', '$690,872', '$9,000,000', '$20,412,216', '$101,173,038', '$36,147,711', '$413,802', '$65,167,430', '$5,746,453', '$627,287', '$1,914,166', '$2,561,820', '$1,008,404', '$1,022,148', '$1,470,809']

Checking the number of elements in the 'nc17_worldwide_x' list.

In [184]:
len(nc17_worldwide_x)
Out[184]:
35

These are the Profit of the new 'G-rated' movies for the 'Profit' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.

In [380]:
nc17_profit = []
for x,y in enumerate(nc17_worldwide):
    nc17_profit.append(y-nc17_budget[x])
print(nc17_profit) #showing the nc17_profit list 
[-7249246, 307113, 13912841, 15390895, 15566240, 1315026, 256669, 201120004, -5340890, 50167430, -17763156, 2311944, -794431, 13912841, -2605698, 2548651, -216465, -596907, 16283563, 3664240, 1038916, -2509128, 8000000, 18912216, 94673038, 34897711, 401802, 50167430, 3546453, -672713, -13085834, -3838180, 958404, -2237424, 858737]

Checking the number of elements in the 'nc17_profit' list.

In [185]:
len(nc17_profit)
Out[185]:
35

Creating the Profit_x column by turning the nc17_profit list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [381]:
nc17_profit_x = []
for i in nc17_profit:
    nc17_profit_x.append("${:,.0f}".format(i))
print(nc17_profit_x) #showing the nc17_profit_x list 
['$-7,249,246', '$307,113', '$13,912,841', '$15,390,895', '$15,566,240', '$1,315,026', '$256,669', '$201,120,004', '$-5,340,890', '$50,167,430', '$-17,763,156', '$2,311,944', '$-794,431', '$13,912,841', '$-2,605,698', '$2,548,651', '$-216,465', '$-596,907', '$16,283,563', '$3,664,240', '$1,038,916', '$-2,509,128', '$8,000,000', '$18,912,216', '$94,673,038', '$34,897,711', '$401,802', '$50,167,430', '$3,546,453', '$-672,713', '$-13,085,834', '$-3,838,180', '$958,404', '$-2,237,424', '$858,737']

Checking the number of elements in the 'nc17_profit_x' list.

In [186]:
len(nc17_profit_x)
Out[186]:
35

These are the number of Tickets sold of the new 'NC-17 rated' movies for the 'Tickets' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [382]:
nc17_tickets = []
for i in nc17_worldwide:
    nc17_tickets.append(round(i/10))
print(nc17_tickets) #showing the nc17_tickets list 
[3775075, 1530711, 2041284, 1946584, 1656624, 231503, 382224, 21312000, 465911, 6516743, 123684, 266194, 20557, 2041284, 209430, 345342, 278354, 10309, 5028356, 389424, 203892, 69087, 900000, 2041222, 10117304, 3614771, 41380, 6516743, 574645, 62729, 191417, 256182, 100840, 102215, 147081]

Checking the number of elements in the 'nc17_tickets' list.

In [383]:
len(nc17_tickets)
Out[383]:
35

Creating the Tickets_x column by turning the nc17_tickets list into string. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.

In [384]:
nc17_tickets_x = []
for i in nc17_tickets:
    nc17_tickets_x.append("{:,.0f}".format(i))
print(nc17_tickets_x) #showing the nc17_tickets_x list 
['3,775,075', '1,530,711', '2,041,284', '1,946,584', '1,656,624', '231,503', '382,224', '21,312,000', '465,911', '6,516,743', '123,684', '266,194', '20,557', '2,041,284', '209,430', '345,342', '278,354', '10,309', '5,028,356', '389,424', '203,892', '69,087', '900,000', '2,041,222', '10,117,304', '3,614,771', '41,380', '6,516,743', '574,645', '62,729', '191,417', '256,182', '100,840', '102,215', '147,081']

Checking the number of elements in the 'nc17_tickets_x' list.

In [385]:
len(nc17_tickets_x)
Out[385]:
35

Creating the nc17_dataframe dataframe.

In [386]:
nc17_dataframe =  pd.DataFrame({"Movie":nc17_name, "Release_Date":nc17_date, "Genre":nc17_genre,
                            "Rating":nc17_rated, 
                            "Production_Budget":nc17_budget, "Production_Budget_x":nc17_budget_x, 
                            "Domestic_Gross":nc17_domestic, "Domestic_Gross_x":nc17_domestic_x, 
                            "Foreign_Gross":nc17_foreign, "Foreign_Gross_x":nc17_foreign_x, 
                            "Worldwide_Gross":nc17_worldwide, "Worldwide_Gross_x":nc17_worldwide_x, 
                            "Profit":nc17_profit, "Profit_x":nc17_profit_x, "Tickets":nc17_tickets, 
                            "Tickets_x":nc17_tickets_x, "Runtime":nc17_runtime, "Averagerating":nc17_rating, 
                            "Company":nc17_company, "Star":nc17_star, "Director":nc17_director, "Writer":nc17_writer
                             })

The first five columns of the nc17_dataframe dataframe.

In [387]:
nc17_dataframe.head()
Out[387]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x ... Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Showgirls September 22, 1995 Drama NC-17 45000000 $45,000,000 20350754 $20,350,754 17400000 $17,400,000 ... -7249246 $-7,249,246 3775075 3,775,075 97 4.9 Carolco Pictures Elizabeth Berkley Paul Verhoeven Joe Eszterhas
1 The Dreamers February 6, 2004 Drama NC-17 15000000 $15,000,000 2531462 $2,531,462 12775651 $12,775,651 ... 307113 $307,113 1530711 1,530,711 130 7.1 Recorded Picture Company Eva Green Bernardo Bertolucci Gilbert Adair
2 Shame December 2, 2011 Drama NC-17 6500000 $6,500,000 4002293 $4,002,293 16410548 $16,410,548 ... 13912841 $13,912,841 2041284 2,041,284 101 7.2 Film4 Michael Fassbender Steve McQueen Abi Morgan
3 Blue Is the Warmest Colour October 25, 2013 Drama NC-17 4074940 $4,074,940 2199787 $2,199,787 17266048 $17,266,048 ... 15390895 $15,390,895 1946584 1,946,584 180 7.7 Wild Bunch Léa Seydoux Abdellatif Kechiche Ghalia Lacroix
4 Blue Valentine December 29, 2010 Drama NC-17 1000000 $1,000,000 9737892 $9,737,892 6828348 $6,828,348 ... 15566240 $15,566,240 1656624 1,656,624 120 7.4 Hunting Lane Films Ryan Gosling Derek Cianfrance Cami Delavigne

5 rows × 22 columns

Appending the nc17_dataframe dataframe to the Drama_DataFrame.

In [388]:
Drama_DataFrame = Drama_DataFrame.append(nc17_dataframe, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1741375482.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  Drama_DataFrame = Drama_DataFrame.append(nc17_dataframe, ignore_index=True)

NC-17 genre now has more movies added to it.

In [390]:
grouped2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
    if Drama_DataFrame.Genre[i] == 'Drama':grouped2.append(Drama_DataFrame.Rating[i])
grouped2 = Counter(grouped2)
grouped2
Out[390]:
Counter({'PG': 67, 'R': 77, 'PG-13': 76, 'Not Rated': 3, 'NC-17': 49, 'G': 34})

Some of the Worldwide Gross are in string instead of int, this code will change it back to int.

In [391]:
new = []
for i in  Drama_DataFrame.Worldwide_Gross:
    if isinstance(i, str):new.append(int(i.replace("$","").replace(",","")))
    else:new.append(i)
print(new) #showing the new list 
[180047784, 142634358, 693698673, 449948323, 634454789, 54462971, 368567189, 137551594, 47818913, 84154026, 381398492, 371350619, 90552675, 74966854, 134612435, 213591522, 179748880, 108660270, 41642166, 26387039, 71004627, 203127894, 48478084, 570998101, 162498338, 169590606, 16340767, 31124367, 116809717, 50647416, 173567581, 96068724, 97143987, 85309093, 252276928, 61721826, 24687524, 63802928, 165552290, 160558438, 15826984, 197618160, 68984536, 16481405, 94050951, 15815509, 41059418, 213120004, 142033509, 96633833, 66540205, 29847480, 82917283, 6792768, 64282881, 77735925, 32398681, 31054727, 4065020, 38017873, 5046038, 304604712, 92678948, 208265198, 46604054, 22281732, 28270399, 3727746, 76086711, 14189810, 7680250, 7719630, 334522294, 38028230, 52545707, 56506120, 8217571, 7585011, 128955898, 20601987, 59168692, 11173718, 528731, 34044909, 331266710, 38358392, 33069303, 36262783, 11831131, 32909437, 19859167, 23477345, 35830713, 12034913, 42843521, 78356170, 62076141, 56178935, 70133905, 61603136, 31556959, 2179623, 36787044, 21817298, 81831866, 21971021, 77733867, 50827466, 31187727, 36964656, 10765283, 382946, 20412841, 16369708, 148806510, 41699612, 17499242, 18945682, 2821010, 6205034, 17536004, 4972016, 1027760, 1200000, 57273049, 679482, 40454520, 20433227, 38969037, 73975239, 23251930, 15298355, 35185884, 16610760, 16131551, 11295324, 10153415, 2088390, 7482387, 6328516, 21270290, 12231500, 14244931, 5552584, 16566240, 5438911, 1156309, 852399, 354836, 3728400, 62375, 2102779, 429448, 2769782, 9709597, 46918287, 542351353, 73986904, 305937718, 216601214, 38102988, 27118000, 534816, 37306334, 47494916, 19344615, 38741732, 114830111, 43545364, 18948425, 3438735, 137587063, 64605762, 33473297, 89137047, 8526288, 64667874, 106269971, 35656130, 3987768, 7025496, 152036382, 171120329, 13835130, 14859394, 134582776, 6101815, 63954968, 10769960, 32255440, 15164458, 127956187, 2819485, 43440294, 17815212, 157297525, 35856053, 119285432, 40716963, 14920781, 3281232, 14923752, 125052686, 549368315, 6668025, 199078, 64892670, 4786789, 8443124, 2044892, 2400000, 1705908, 80008942, 48000000, 17356268, 1008404, 277845, 1614784, 20412216, 20350754, 98410061, 496059, 15121165, 1022148, 67091915, 20412841, 19465835, 195494, 2411143, 1025228, 18587135, 8721243, 40300, 10015449, 80693537, 54766923, 77211836, 34718173, 1951683, 636796, 2447576, 9171289, 3256082, 13000000, 11000000, 438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 69131860, 4517000, 143985708, 10015449, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868, 268000000, 72071636, 108998, 47707417, 30194409, 65500000, 7600377, 12000000, 592861, 37750754, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 4659110, 65167430, 1236844, 2661944, 205569, 20412841, 2094302, 3453416, 2783535, 103093, 50283563, 3894240, 2038916, 690872, 9000000, 20412216, 101173038, 36147711, 413802, 65167430, 5746453, 627287, 1914166, 2561820, 1008404, 1022148, 1470809]

Checking the number of elements in the 'new' list.

In [392]:
len(new)
Out[392]:
306

Now that they are all integers. The dataframe will be updating the Worldwide_Gross colunm with the new and imporved values.

In [393]:
Drama_DataFrame.Worldwide_Gross = new

Some of the Worldwide_Gross_x are in int instead of in dollars, this code will change it to dollars.

In [394]:
new1 = []
for i in  Drama_DataFrame.Worldwide_Gross_x:
    if isinstance(i, int):new1.append('${:,.0f}'.format(i))
    else:new1.append(i)
print(new1) #showing the new1 list 
['$180,047,784', '$142,634,358', '$693,698,673', '$449,948,323', '$634,454,789', '$54,462,971', '$368,567,189', '$137,551,594', '$47,818,913', '$84,154,026', '$381,398,492', '$371,350,619', '$90,552,675', '$74,966,854', '$134,612,435', '$213,591,522', '$179,748,880', '$108,660,270', '$41,642,166', '$26,387,039', '$71,004,627', '$203,127,894', '$48,478,084', '$570,998,101', '$162,498,338', '$169,590,606', '$16,340,767', '$31,124,367', '$116,809,717', '$50,647,416', '$173,567,581', '$96,068,724', '$97,143,987', '$85,309,093', '$252,276,928', '$61,721,826', '$24,687,524', '$63,802,928', '$165,552,290', '$160,558,438', '$15,826,984', '$197,618,160', '$68,984,536', '$16,481,405', '$94,050,951', '$15,815,509', '$41,059,418', '$213,120,004', '$142,033,509', '$96,633,833', '$66,540,205', '$29,847,480', '$82,917,283', '$6,792,768', '$64,282,881', '$77,735,925', '$32,398,681', '$31,054,727', '$4,065,020', '$38,017,873', '$5,046,038', '$304,604,712', '$92,678,948', '$208,265,198', '$46,604,054', '$22,281,732', '$28,270,399', '$3,727,746', '$76,086,711', '$14,189,810', '$7,680,250', '$7,719,630', '$334,522,294', '$38,028,230', '$52,545,707', '$56,506,120', '$8,217,571', '$7,585,011', '$128,955,898', '$20,601,987', '$59,168,692', '$11,173,718', '$528,731', '$34,044,909', '$331,266,710', '$38,358,392', '$33,069,303', '$36,262,783', '$11,831,131', '$32,909,437', '$19,859,167', '$23,477,345', '$35,830,713', '$12,034,913', '$42,843,521', '$78,356,170', '$62,076,141', '$56,178,935', '$70,133,905', '$61,603,136', '$31,556,959', '$2,179,623', '$36,787,044', '$21,817,298', '$81,831,866', '$21,971,021', '$77,733,867', '$50,827,466', '$31,187,727', '$36,964,656', '$10,765,283', '$382,946', '$20,412,841', '$16,369,708', '$148,806,510', '$41,699,612', '$17,499,242', '$18,945,682', '$2,821,010', '$6,205,034', '$17,536,004', '$4,972,016', '$1,027,760', '$1,200,000', '$57,273,049', '$679,482', '$40,454,520', '$20,433,227', '$38,969,037', '$73,975,239', '$23,251,930', '$15,298,355', '$35,185,884', '$16,610,760', '$16,131,551', '$11,295,324', '$10,153,415', '$2,088,390', '$7,482,387', '$6,328,516', '$21,270,290', '$12,231,500', '$14,244,931', '$5,552,584', '$16,566,240', '$5,438,911', '$1,156,309', '$852,399', '$354,836', '$3,728,400', '$62,375', '$2,102,779', '$429,448', '$2,769,782', '$9,709,597', '$46,918,287', '$542,351,353', '$73,986,904', '$305,937,718', '$216,601,214', '$38,102,988', '$27,118,000', '$534,816', '$37,306,334', '$47,494,916', '$19,344,615', '$38,741,732', '$114,830,111', '$43,545,364', '$18,948,425', '$3,438,735', '$137,587,063', '$64,605,762', '$33,473,297', '$89,137,047', '$8,526,288', '$64,667,874', '$106,269,971', '$35,656,130', '$3,987,768', '$7,025,496', '$152,036,382', '$171,120,329', '$13,835,130', '$14,859,394', '$134,582,776', '$6,101,815', '$63,954,968', '$10,769,960', '$32,255,440', '$15,164,458', '$127,956,187', '$2,819,485', '$43,440,294', '$17,815,212', '$157,297,525', '$35,856,053', '$119,285,432', '$40,716,963', '$14,920,781', '$3,281,232', '$14,923,752', '$125,052,686', '$549,368,315', '$6,668,025', '$199,078', '$64,892,670', '$4,786,789', '$8,443,124', '$2,044,892', '$2,400,000', '$1,705,908', '$80,008,942', '$48,000,000', '$17,356,268', '$1,008,404', '$277,845', '$1,614,784', '$20,412,216', '$20,350,754', '$98,410,061', '$496,059', '$15,121,165', '$1,022,148', '$67,091,915', '$20,412,841', '$19,465,835', '$195,494', '$2,411,143', '$1,025,228', '$18,587,135', '$8,721,243', '$40,300', '$10,015,449', '$80,693,537', '$54,766,923', '$77,211,836', '$34,718,173', '$1,951,683', '$636,796', '$2,447,576', '$9,171,289', '$3,256,082', '$13,000,000', '$11,000,000', '$438,656,843', '$66,947,950', '$27,469,621', '$37,799,643', '$325,500,000', '$246,100,000', '$3,750,000', '$69,131,860', '$4,517,000', '$143,985,708', '$10,015,449', '$17,657,973', '$80,491,516', '$311,281,000', '$286,214,195', '$90,482,317', '$986,214,868', '$268,000,000', '$72,071,636', '$108,998', '$47,707,417', '$30,194,409', '$65,500,000', '$7,600,377', '$12,000,000', '$592,861', '$37,750,754', '$15,307,113', '$20,412,841', '$19,465,835', '$16,566,240', '$2,315,026', '$3,822,241', '$213,120,004', '$4,659,110', '$65,167,430', '$1,236,844', '$2,661,944', '$205,569', '$20,412,841', '$2,094,302', '$3,453,416', '$2,783,535', '$103,093', '$50,283,563', '$3,894,240', '$2,038,916', '$690,872', '$9,000,000', '$20,412,216', '$101,173,038', '$36,147,711', '$413,802', '$65,167,430', '$5,746,453', '$627,287', '$1,914,166', '$2,561,820', '$1,008,404', '$1,022,148', '$1,470,809']

Checking the number of elements in the 'new1' list.

In [395]:
len(new1)
Out[395]:
306

Now that they are all strings. The dataframe will be updating the Worldwide_Gross_x colunm with the new and imporved values.

In [396]:
Drama_DataFrame.Worldwide_Gross_x = new1

This is the final product of all the editting and merging of other csv files and the merging of dataframes that were extra movies that were also 'R', 'G', 'NC-17' and 'PG' rated, due to a uneven distribution of the system ratings in the previous dataframe. Drama_DataFrame will be the dataframe used thoughout this analysis.

In [397]:
pd.set_option('display.max_columns', None)#showing all the columns
Drama_DataFrame
Out[397]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 180047784 $180,047,784 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 142634358 $142,634,358 -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 693698673 $693,698,673 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 449948323 $449,948,323 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 634454789 $634,454,789 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
301 A Dirty Shame September 24, 2004 Drama NC-17 15000000.0 $15,000,000 1339668 $1,339,668 574498.0 $574,498 1914166 $1,914,166 -13085834.0 $-13,085,834 191417 191,417 84.0 5.1 Killer Films Suzanne Shepherd John Waters John Waters
302 Young Adam April 16, 2004 Drama NC-17 6400000.0 $6,400,000 767373 $767,373 1794447.0 $1,794,447 2561820 $2,561,820 -3838180.0 $-3,838,180 256182 256,182 98.0 6.4 Recorded Picture Company Tilda Swinton David Mackenzie \tDavid Mackenzie
303 Whore 1991 October 4, 1991 Drama NC-17 50000.0 $50,000 0 $0 0.0 $0 1008404 $1,008,404 958404.0 $958,404 100840 100,840 80.0 5.5 Cheap Date Theresa Russell Ken Russell Deborah Dalton
304 Ma Mère May 13, 2005 Drama NC-17 3259572.0 $3,259,572 71616 $71,616 950532.0 $950,532 1022148 $1,022,148 -2237424.0 $-2,237,424 102215 102,215 110.0 5.0 Gemini Films Louis Garrel Christophe Honoré Christophe Honoré
305 Law of Desire April 3, 1987 Drama NC-17 612072.0 $612,072 0 $0 0.0 $0 1470809 $1,470,809 858737.0 $858,737 147081 147,081 82.0 7.1 El Deseo Antonio Banderas Pedro Almodóvar Pedro Almodóvar

306 rows × 22 columns

1. Return on Investment or ROI

This is the blueprint for creating the first visualization, Return on Investment or ROI. Pandas DataFrame will be used to create these visualizations. The dataframe will use the Styler class for styling by passing style functions into Styler.apply or Styler.applymap. The styling will be performed after the data in the DataFrames has been processed. There will be five dataframes based ont he systerm ratings, 'PG', 'PG-13', 'R', 'NC-17' and 'G'. The Styler will be used to create an HTML and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc.

Blueprint:

  1. The dataframe that will be used in these visualizations will be the Drama_DataFrame, this dataframe is movies based on the Drama genre and these are the names of the columns that Drama_DataFrame consist of;

    • Movie
    • Release_Date
    • Genre
    • Rating
    • Production_Budget
    • Production_Budget_x
    • Domestic_Gross
    • Domestic_Gross_x
    • Foreign_Gross
    • Foreign_Gross_x
    • Worldwide_Gross
    • Worldwide_Gross_x
    • Profit
    • Profit_x
    • Tickets
    • Tickets_x
    • Runtime
    • Averagerating
    • Company
    • Star
    • Director
    • Writer
  2. There will be five styled dataframes and in each dataframes are key information of the movies, these datafrems will be categorized into system rating which are 'R', 'PG-13', 'PG', 'G' and 'NC-17'. These are the five columns that will be in every dataframe;

    • Name of Movie
    • Cost
    • Return on Investment
    • ROI Percentage
    • Rating
  3. The dataframes has to have visual components that convey the information smoothly and also differentiate from one another due to that fact that they are all in groups, these are the formats that help do so;

    • Color: Each dataframe will have different shades of red to distinguise that they all have different system rating.
    • Set_table_styles: This allows the column, headings and indexes to be changes or modified to different styles and colours.
    • Highlight_cells: This is not a built in function but a function created to spefically highlight unique information on the dataframe.
    • Borders: This is also not a bulit in function but a function creataed to modify the color and size of the dataframes borders and to which rows or coulmns borders will be modified.

Itables is a libary that is inatalled to allow all dataframes to be shown as interactive datatables.

In [14]:
from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)

Drama_DataFrame is the dataframe that will be used throughout this analysis. (this dataframe is interactive)

In [15]:
Drama_DataFrame
Out[15]:
Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
Loading... (need help?)

Below will be the start of the creation of dataframes that are in the 'Drama Genre' that are 'R-rated' based on the 'ROI' of each movie.

Index of all the 'R' rated movies.

In [375]:
r_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
    if x == 'R':r_index.append(i)
print(r_index) #showing the r_index list 
[1, 3, 5, 6, 9, 10, 11, 13, 14, 23, 29, 36, 39, 53, 55, 56, 57, 58, 59, 64, 66, 67, 71, 76, 77, 81, 82, 84, 85, 87, 88, 90, 92, 93, 94, 97, 98, 101, 103, 106, 110, 111, 116, 118, 120, 121, 124, 125, 126, 127, 128, 130, 133, 134, 135, 136, 137, 139, 140, 142, 144, 145, 146, 147, 150, 152, 153, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244]

Checking the number of elements in the 'r_index' list.

In [402]:
len(r_index)
Out[402]:
77

Getiing the Profit for all 'R' rated movies.

In [51]:
r_profit = []
for i in r_index:
        r_profit.append(Drama_DataFrame.Profit[i])
print(r_profit) #showing the r_profit list 
[-7365642.0, 349948323.0, -13537029.0, 307567189.0, 24154026.0, 326398492.0, 316350619.0, 19966854.0, 82112435.0, 530998101.0, 13147416.0, -10312476.0, 129558438.0, -18207232.0, 54735925.0, 9898681.0, 8554727.0, -17934980.0, 17017873.0, 26604054.0, 8270399.0, -16272254.0, -10280370.0, -7782429.0, -8414989.0, -3826282.0, -14471269.0, 318266710.0, 25358392.0, 23262783.0, -1168869.0, 7859167.0, 23830713.0, 34913.0, 31043521.0, 45178935.0, 60133905.0, -7820377.0, 12417298.0, 69233867.0, 3765283.0, -6617054.0, 12499242.0, -2178990.0, 12636004.0, 222016.0, 53273049.0, -3320518.0, 36954520.0, 17033227.0, 35669037.0, 20251930.0, 14610760.0, 14131551.0, 9295324.0, 8153415.0, 88390.0, 4328516.0, 19282640.0, 12744931.0, 15566240.0, 4438911.0, 156309.0, -147601.0, -187625.0, 294448.0, 2669782.0, 48766923.0, 68711836.0, 14718173.0, 1851683.0, -25363204.0, -4052424.0, -12828711.0, 556082.0, 1500000.0, 2000000.0]

Checking the number of elements in the 'r_profit' list.

In [3]:
len(r_profit)
Out[3]:
77

Getiing the Cost for all 'R' rated movies.

In [52]:
r_cost = []
for i in r_index:
        r_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(r_cost) #showing the r_cost list 
['$150,000,000', '$100,000,000', '$68,000,000', '$61,000,000', '$60,000,000', '$55,000,000', '$55,000,000', '$55,000,000', '$52,500,000', '$40,000,000', '$37,500,000', '$35,000,000', '$31,000,000', '$25,000,000', '$23,000,000', '$22,500,000', '$22,500,000', '$22,000,000', '$21,000,000', '$20,000,000', '$20,000,000', '$20,000,000', '$18,000,000', '$16,000,000', '$16,000,000', '$15,000,000', '$15,000,000', '$13,000,000', '$13,000,000', '$13,000,000', '$13,000,000', '$12,000,000', '$12,000,000', '$12,000,000', '$11,800,000', '$11,000,000', '$10,000,000', '$10,000,000', '$9,400,000', '$8,500,000', '$7,000,000', '$7,000,000', '$5,000,000', '$5,000,000', '$4,900,000', '$4,750,000', '$4,000,000', '$4,000,000', '$3,500,000', '$3,400,000', '$3,300,000', '$3,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$1,987,650', '$1,500,000', '$1,000,000', '$1,000,000', '$1,000,000', '$1,000,000', '$250,000', '$135,000', '$100,000', '$6,000,000', '$8,500,000', '$20,000,000', '$100,000', '$26,000,000', '$6,500,000', '$22,000,000', '$2,700,000', '$11,500,000', '$9,000,000']

Checking the number of elements in the 'r_cost' list.

In [405]:
len(r_cost)
Out[405]:
77

Getiing the Name for all 'R' rated movies.

In [53]:
r_name = []
for i in r_index:
        r_name.append(Drama_DataFrame.Movie[i])
print(r_name) #showing the r_name list 
['The Wolfman', 'Django Unchained', 'Downsizing', 'Gone Girl', 'Priest', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Crimson Peak', 'Zero Dark Thirty', 'Fifty Shades of Grey', 'The Master', 'Biutiful', 'Flight', 'Tulip Fever', 'The Ides of March', 'Nocturnal Animals', 'The Water Diviner', 'Stone', 'For Colored Girls', 'The Debt', 'Let Me In', 'By the Sea', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Black Swan', 'Ex Machina', 'Room', 'Chloe', 'If Beale Street Could Talk', 'Arbitrage', 'Stoker', 'Carol', 'Quartet', 'Hereditary', 'Coriolanus', 'Melancholia', 'Manchester by the Sea', 'We Need to Talk About Kevin', 'Hesher', 'Addicted', 'Everything Must Go', 'Mommy', 'Take Shelter', 'Boyhood', 'Stake Land', 'The Witch', 'Margin Call', 'Whiplash', 'Before Midnight', 'Silent House', "Winter's Bone", 'The Florida Project', 'We Are Your Friends', 'Locke', 'Knock Knock', 'Buried', 'Unsane', 'Blue Valentine', 'Martha Marcy May Marlene', 'Palo Alto', 'I Origins', 'The Canyons', 'Sound of My Voice', 'A Ghost Story', 'Ordinary People', 'Fame', 'Endless Love', 'Ghost Story', 'One from the Heart', 'The Hand', 'Pennies from Heaven', 'Zoot Suit', 'Rich and Famous', 'Raggedy Man']

Checking the number of elements in the 'r_name' list.

In [5]:
len(r_name)
Out[5]:
77

Getiing the ROI for all 'R' rated movies.

In [54]:
r_return_on_investment = []
for i in r_index:
        r_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(r_return_on_investment) #showing the r_return_on_investment list 
['$-7,365,642', '$349,948,323', '$-13,537,029', '$307,567,189', '$24,154,026', '$326,398,492', '$316,350,619', '$19,966,854', '$82,112,435', '$530,998,101', '$13,147,416', '$-10,312,476', '$129,558,438', '$-18,207,232', '$54,735,925', '$9,898,681', '$8,554,727', '$-17,934,980', '$17,017,873', '$26,604,054', '$8,270,399', '$-16,272,254', '$-10,280,370', '$-7,782,429', '$-8,414,989', '$-3,826,282', '$-14,471,269', '$318,266,710', '$25,358,392', '$23,262,783', '$-1,168,869', '$7,859,167', '$23,830,713', '$34,913', '$31,043,521', '$45,178,935', '$60,133,905', '$-7,820,377', '$12,417,298', '$69,233,867', '$3,765,283', '$-6,617,054', '$12,499,242', '$-2,178,990', '$12,636,004', '$222,016', '$53,273,049', '$-3,320,518', '$36,954,520', '$17,033,227', '$35,669,037', '$20,251,930', '$14,610,760', '$14,131,551', '$9,295,324', '$8,153,415', '$88,390', '$4,328,516', '$19,282,640', '$12,744,931', '$15,566,240', '$4,438,911', '$156,309', '$-147,601', '$-187,625', '$294,448', '$2,669,782', '$48,766,923', '$68,711,836', '$14,718,173', '$1,851,683', '$-25,363,204', '$-4,052,424', '$-12,828,711', '$556,082', '$1,500,000', '$2,000,000']

Checking the number of elements in the 'r_return_on_investment' list.

In [408]:
len(r_return_on_investment)
Out[408]:
77

Getiing the Ratings of all 'R' rated movies.

In [55]:
r_rating = []
for i in r_index:
        r_rating.append(Drama_DataFrame.Averagerating[i])
print(r_rating) #showing the r_rating list 
[5.8, 8.4, 5.7, 8.1, 5.7, 4.6, 4.5, 6.5, 7.4, 4.1, 7.1, 7.5, 7.3, 6.2, 7.1, 7.5, 7.1, 5.6, 6.1, 6.9, 7.1, 5.3, 7.5, 6.6, 6.6, 7.1, 6.9, 8.0, 7.7, 8.2, 6.9, 7.2, 6.6, 6.8, 7.2, 6.8, 7.3, 8.7, 7.2, 7.8, 7.5, 7.0, 5.2, 6.4, 8.1, 7.4, 7.9, 6.5, 6.8, 7.1, 8.5, 7.9, 5.3, 7.2, 7.6, 6.2, 7.1, 4.9, 7.0, 6.4, 7.4, 6.9, 6.2, 7.4, 3.8, 6.6, 6.8, 7.7, 6.6, 4.9, 6.3, 6.5, 5.5, 6.5, 6.8, 5.9, 6.8]

Checking the number of elements in the 'r_rating' list.

In [7]:
len(r_rating)
Out[7]:
77

Getiing the Profit Percentage of all 'R' rated movies.

In [376]:
r_percent_profit = []
for i in r_index:
    i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
    r_percent_profit.append(int(round(i,0)))
print(r_percent_profit) #showing the r_percent_profit list 
[-5, 350, -20, 504, 40, 593, 575, 36, 156, 1327, 35, -29, 418, -73, 238, 44, 38, -82, 81, 133, 41, -81, -57, -49, -53, -26, -96, 2448, 195, 179, -9, 65, 199, 0, 263, 411, 601, -78, 132, 815, 54, -95, 250, -44, 258, 5, 1332, -83, 1056, 501, 1081, 675, 731, 707, 465, 408, 4, 216, 970, 850, 1557, 444, 16, -15, -75, 218, 2670, 813, 808, 74, 1852, -98, -62, -58, 21, 13, 22]

Checking the number of elements in the 'r_percent_profit' list.

In [377]:
len(r_percent_profit)
Out[377]:
77

Converting integer of the ROI values to percentage of all 'R' rated movies.

In [57]:
r_roi_percent = []
for i in r_percent_profit:
    r_roi_percent.append("{:}%".format(i))
print(r_roi_percent) #showing the r_roi_percent list 
['-5%', '350%', '-20%', '504%', '40%', '593%', '575%', '36%', '156%', '1327%', '35%', '-29%', '418%', '-73%', '238%', '44%', '38%', '-82%', '81%', '133%', '41%', '-81%', '-57%', '-49%', '-53%', '-26%', '-96%', '2448%', '195%', '179%', '-9%', '65%', '199%', '0%', '263%', '411%', '601%', '-78%', '132%', '815%', '54%', '-95%', '250%', '-44%', '258%', '5%', '1332%', '-83%', '1056%', '501%', '1081%', '675%', '731%', '707%', '465%', '408%', '4%', '216%', '970%', '850%', '1557%', '444%', '16%', '-15%', '-75%', '218%', '2670%', '813%', '808%', '74%', '1852%', '-98%', '-62%', '-58%', '21%', '13%', '22%']

Checking the number of elements in the 'r_roi_percent' list.

In [413]:
len(r_roi_percent)
Out[413]:
77

Turning the integer of the star rating of each movie into a star of all 'R' rated movies.

In [58]:
r_stars = []
for i in r_rating:
    r_stars.append('*'*int(i))
print(r_stars) #showing the r_stars list 
['*****', '********', '*****', '********', '*****', '****', '****', '******', '*******', '****', '*******', '*******', '*******', '******', '*******', '*******', '*******', '*****', '******', '******', '*******', '*****', '*******', '******', '******', '*******', '******', '********', '*******', '********', '******', '*******', '******', '******', '*******', '******', '*******', '********', '*******', '*******', '*******', '*******', '*****', '******', '********', '*******', '*******', '******', '******', '*******', '********', '*******', '*****', '*******', '*******', '******', '*******', '****', '*******', '******', '*******', '******', '******', '*******', '***', '******', '******', '*******', '******', '****', '******', '******', '*****', '******', '******', '*****', '******']

Checking the number of elements in the 'r_stars' list.

In [415]:
len(r_stars)
Out[415]:
77

Createing the 'R' rated dataframe with the variables previously created.

In [193]:
system_rating_r = pd.DataFrame({"Name of Movie":r_name, "Cost":r_cost, 
                                "Return On Investment":r_return_on_investment, 
                                "ROI Percentage":r_roi_percent,"Ratings":r_stars})

The 'system_rating_r' dataframe. (this dataframe is interactive)

In [372]:
system_rating_r
Out[372]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the index of all the negative values.

In [378]:
neg_values = []
for i,x in enumerate(r_percent_profit): 
    if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list 
[0, 2, 11, 13, 17, 21, 22, 23, 24, 25, 26, 30, 33, 37, 41, 43, 47, 63, 64, 71, 72, 73]

Checking the number of elements in the 'neg_values' list.

In [379]:
len(neg_values)
Out[379]:
22

Dropping the negative values and resetting the index of the system_rating_r dataframe.

In [380]:
system_rating_r = system_rating_r.drop(labels=neg_values, axis=0)
system_rating_r = system_rating_r.reset_index(drop=True)

The new 'system_rating_r' dataframe. (this dataframe is interactive)

In [381]:
system_rating_r
Out[381]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Dividing the system_rating_r datafrme into three dataframes.

System_rating_r1 is the first dataframe. (this dataframe is interactive)

In [382]:
system_rating_r1=system_rating_r[:19]
system_rating_r1
Out[382]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_r2 is the second dataframe. (this dataframe is interactive)

In [383]:
system_rating_r2=system_rating_r[19:37]
system_rating_r2
Out[383]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_r3 is the third dataframe. (this dataframe is interactive)

In [384]:
system_rating_r3=system_rating_r[37:]
system_rating_r3
Out[384]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the average Budget of all the 'R' rated movies in the Drama genre.

In [435]:
r_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_r['Cost']]) / len(system_rating_r['Cost'])

The average Budget of all the 'R' rated Drama movies is $16,455,866.

In [436]:
r_avg_value
Out[436]:
16455866.363636363

Getting the index of all the movies that are above the average Budegt of all the 'R' rated Drama mvoies.

In [437]:
r_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_r['Cost']]
#above ayg
r_below_avg1 = []
for i,x in enumerate(r_cost_index):
    if x <= 16455866:r_below_avg1.append(i)
    
r_below_avg2 = []
for i,x in enumerate(r_cost_index):
    if x >= 16455866:r_below_avg2.append(i)

The 'r_below_avg1' list.

In [438]:
print(r_below_avg1)
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54]

The 'r_below_avg2' list.

In [439]:
print(r_below_avg2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 50]

Getting the average Return On Investment of all the 'R' rated movies in the Drama genre.

In [440]:
r_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_r['Return On Investment']]) / len(system_rating_r['Cost'])

The average Return On Investment of all the 'R' rated Drama movies is $59,600,710.

In [441]:
r_avg_value
Out[441]:
59600710.27272727

Getting the index of all the movies that are below the average Return On Investment of all the 'R' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'R' rated Drama mvoies.

In [430]:
r_roi_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_r['Return On Investment']]
#below ayg
r_below_avg3 = []
for i,x in enumerate(r_roi_index):
    if x <= 59600710:r_below_avg3.append(i)
    
r_below_avg4 = []
for i,x in enumerate(r_roi_index):
    if x >= 59600710:r_below_avg4.append(i)

The 'r_below_avg3' list.

In [442]:
print(r_below_avg3)
[2, 5, 8, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54]

The 'r_below_avg4' list.

In [443]:
print(r_below_avg4)
[0, 1, 3, 4, 6, 7, 9, 16, 23, 25, 49]

Getting the average Return On Investment Percentage of all the 'R' rated movies in the Drama genre.

In [444]:
r_avg_value = sum([int(i.replace('%', ''))
                 for i in system_rating_r['ROI Percentage']]) / len(system_rating_r['Cost'])

The average Return On Investment Percentage of all the 'R' rated Drama movies is 510%.

In [445]:
r_avg_value
Out[445]:
508.8727272727273

Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'R' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'R' rated Drama mvoies.

In [448]:
roi_percent_index_r = [int(i.replace('%', ''))for i in system_rating_r['ROI Percentage']]
#above ayg
r_above_avg = []
for i,x in enumerate(roi_percent_index_r):
    if x >= 508:r_above_avg.append(i)

The 'r_above_avg' list.

In [449]:
r_above_avg
Out[449]:
[3, 4, 7, 16, 23, 25, 30, 31, 33, 34, 35, 36, 41, 42, 43, 47, 48, 49, 51]

After getting all the indexes of the movies that fit the quitria, they will then be used to style the dataframes and be used to higlight particular cells in the dataframes. Funtions are created to carry out this objective. There are seven main functions that will be used to get the expected results on the dataframes.

  • Def Ratings:

    • This function was created to increase the size of the stars in the 'Rating' column which is also considered the fourth coulnm. This styling will be done to all the three divided dataframes and it also makes the starts bold.

    • As the dataframes were created the font-size of the 'Cost' column, which is also considered the first coulnm seemed very small. So the second part of the function increasese the font size of the 'Cost' column which is also considered the first coulnm. This styling will be done to all the three divided dataframes.









  • Def Ratings_highlight:

    • This function was created to add colour to the dataframes.

    • The first part adds color to the 'Ratings' column which is also considered the fourth coulnm, by making the background color white and the stars gold.

    • The second part adds color to the 'Name' column which is also considered the zeroth coulnm, by making the background color white the text a particular red shade (there will be different red shades for each system rating dataframe). It also changes the font size and make the text bold.

    • The third part adds color to the 'Cost', 'Return on Investment' and 'ROI Percentage' columns which is also considered the first, second and third coulnms, by making the background white, the color of the text black and making the font size bigger.
  • Def Highlight_cells1:

    • There five highlight_cells function for each dataframe.

    • This is the first Highlight_cell function. This function adds colour to the 'Cost' column which is also considered the second coulnm. It makes the cells in this column Yellow if the 'Cost' of the movie is below the average budget of all the 'R' rated movies in the Drama Genre which is 16,455,866.









  • Def Highlight_cells2:

    • There five highlight_cells function for each dataframe.

    • This is the second Highlight_cell function. This function adds colour to the 'Cost' column, which is also considered the first coulnm. This makes the cells in this column Red if the 'Cost' of the movie is above the average budget of all the 'R' rated movies in the Drama Genre which is 16,455,866.









  • Def Highlight_cells3:

    • There five highlight_cells function for each dataframe.

    • The is the third Highlight_cell function. This function adds colour to the 'Return On Investment' column, which is also considered the second coulnm. This makes the cells in this column Red if the 'ROI' of the movie is below the average ROI of all the 'R' rated movies in the Drama Genre which is 59,600,710.
  • Def Highlight_cells4:

    • There five highlight_cells function for each dataframe.

    • This is the fourth Highlight_cell function. This function adds colour to the 'Return On Investment' column,which is also considered the second coulnm. This makes the cells in this column Yellow if the 'ROI' of the movie is above the average ROI of all the 'R' rated movies in the Drama Genre which is 59,600,710.
  • Def Highlight_cells5:

    • There five highlight_cells function for each dataframe.

    • This is the fivth Highlight_cell function. This function adds colour to the 'Return On Investment Percentage' column, which is also considered the third coulnm. This makes the cells in this column Green if the 'ROI Percentage' of the movie is above the average ROI Percentage of all the 'R' rated movies in the Drama Genre which is 510 percent.
  • Def borders:

    • This functions add a thin blue border on the rows in the dataframe that has the yellow cells in the columns 'Cost' and 'Return On Investment' and a green cells on the column 'ROI Percentage'. This means the movies is a golden movie they are below the average budget, above the average ROI and above the average ROI Percentage. They are perfect, they hit all the targets. These movies will be paid close attention to later on.

Styling Syetem_rating_r1 using the eight functions and the indexes to do so.

In [450]:
def Ratings1(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(19):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight2(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(19):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#ff5500;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
    return df 


def highlight_cells3(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in r_below_avg1[:3]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells4(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in r_below_avg2[:-1]:
        df.iloc[i,1] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells5(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in r_below_avg3[:11]:
        df.iloc[i,2] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells6(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in r_below_avg4[:8]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells7(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in r_above_avg[:4]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 

def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_r1=system_rating_r1.style.apply(Ratings_highlight2, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #ff5500')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#ff5500")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#ff5500')]}#index
                         ])\
            .apply(Ratings1, axis=None)\
            .apply(highlight_cells3, axis=None)\
            .apply(highlight_cells4, axis=None)\
            .apply(highlight_cells5, axis=None)\
            .apply(highlight_cells6, axis=None)\
            .apply(highlight_cells7, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')

The 'Syetem_rating_r1' datarame.

Styling Syetem_rating_r2 using the eight functions and the indexes to do so.

In [451]:
def Ratings8(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):#range(19,37):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight9(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#ff5500;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
    return df 

def highlight_cells10(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):#below_avg1[3:21]
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 


def highlight_cells11(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):#below_avg3[11:27]:
        df.iloc[i,2] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells12(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [4, 6]:#below_avg3[11:27]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 


def highlight_cells13(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [4, 6,  11, 12, 14, 15, 16, 17]:#above_avg[4:12]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 

def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 

system_rating_r2 = system_rating_r2.style.apply(Ratings_highlight9, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #ff5500')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#ff5500")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#ff5500')]}#index
                         ])\
            .apply(Ratings8, axis=None)\
            .apply(highlight_cells10, axis=None)\
            .apply(highlight_cells11, axis=None)\
            .apply(highlight_cells12, axis=None)\
            .apply(highlight_cells13, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 2')
            #.apply(borders, axis=None)
            #display_html(df1_style._repr_html_() + df2_style._repr_html_(), raw=True)

The 'Syetem_rating_r2' datarame.

Styling Syetem_rating_r3 using the eight functions and the indexes to do so.

In [452]:
def Ratings14(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight15(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#ff5500;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
    return df 

def highlight_cells16(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells17(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[13,1] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
    return df 


def highlight_cells18(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(18):
        df.iloc[i,2] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells19(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[12,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 


def highlight_cells20(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [4, 5, 6, 10, 11, 12, 14]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 

def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 

system_rating_r3 = system_rating_r3.style.apply(Ratings_highlight15, axis=None)\
            .apply(Ratings14, axis=None)\
            .apply(highlight_cells16, axis=None)\
            .apply(highlight_cells17, axis=None)\
            .apply(highlight_cells18, axis=None)\
            .apply(highlight_cells19, axis=None)\
            .apply(highlight_cells20, axis=None)\
            .set_table_attributes("style='display:inline'")\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #ff5500')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#ff5500")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#ff5500')]}#index
                              ])\
            .set_properties(**{'text-align': 'center'})
                         
            #.set_caption('Caption table 3')
            #.apply(borders, axis=None)

The 'Syetem_rating_r3' datarame.

Saving the System_rating_r1 dataframe to the System_rating_r1.png file as an image to be used for the analysis later on.

In [454]:
dfi.export(system_rating_r1, 'system_rating_r1.png')

Saving the System_rating_r2 dataframe to the System_rating_r2.png file as an image to be used for the analysis later on.

In [455]:
dfi.export(system_rating_r2, 'system_rating_r2.png')

Saving the System_rating_r3 dataframe to the System_rating_r3.png file as an image to be used for the analysis later on.

In [456]:
dfi.export(system_rating_r3, 'system_rating_r3.png')

This allows all the three dataframes to be displayed side by side in the analysis below.

In [225]:
def display_side_by_side(*args):
    html_str = "<center><font size=6 style='color:#ff5500'>The Return On Investement on R-rated Movies.</font></center> <br>  " 
 
    for df in args:
        html_str += df.to_html()
    display_html(
        html_str.replace('table','table style="display:inline"'), 
        raw=True
    )

Below will be the start of the creation of dataframes that are in the 'Drama Genre' that are 'G-rated' based on the 'ROI' of each movie.

Index of all the 'G' rated movies.

In [385]:
g_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
    if x == 'G':g_index.append(i)
print(g_index) #showing the g_index list 
[227, 228, 229, 230, 231, 232, 233, 234, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270]

Checking the number of elements in the 'g_index' list.

In [458]:
len(g_index)
Out[458]:
34

Getiing the Cost for all 'G' rated movies.

In [71]:
g_cost = []
for i in g_index:
        g_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(g_cost) #showing the g_index list 
['$35,446,775', '$700,000', '$8,600,000', '$7,000,000', '$18,000,000', '$4,400,000', '$17,000,000', '$22,000,000', '$20,000,000', '$23,000,000', '$15,000,000', '$2,700,000', '$70,000,000', '$30,000,000', '$2,500,000', '$90,000,000', '$666,000', '$85,000,000', '$17,000,000', '$10,000,000', '$22,000,000', '$18,000,000', '$8,200,000', '$60,000,000', '$45,000,000', '$858,000', '$17,000,000', '$300,000', '$10,000,000', '$6,400,000', '$13,000,000', '$1,750,000', '$1,700,000', '$3,000,000']

Checking the number of elements in the 'g_cost' list.

In [227]:
len(g_cost)
Out[227]:
34

Getiing the Name for all 'G' rated movies.

In [72]:
g_name = []
for i in g_index:
        g_name.append(Drama_DataFrame.Movie[i])
print(g_name) #showing the g_name list
['La traviata', 'A Sunday in the Country', 'Little Dorrit', 'Prancer', 'The Secret Garden', 'Through the Olive Trees', 'A Little Princess', 'The Rookie', 'Beauty and the Beast 1991', 'The Little Rascals', 'Ramona and Beezus', 'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna', 'Babe: Pig in the City', 'Lassie Come Home', "Charlotte's Web", 'A Little Princess', 'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music', 'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', 'Before the Wrath', "Hachiko: A Dog's Story", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Three Cions in the Fountain', 'Miracle of Marcelino']

Checking the number of elements in the 'g_name' list.

In [461]:
len(g_name)
Out[461]:
34

Getiing the ROI for all 'G' rated movies.

In [73]:
g_return_on_investment = []
for i in g_index:
        g_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(g_return_on_investment) #showing the g_return_on_investment list
['$-35,251,281', '$1,711,143', '$-7,574,772', '$11,587,135', '$-9,278,757', '$-4,359,700', '$-6,984,551', '$58,693,537', '$418,656,843', '$43,947,950', '$12,469,621', '$35,099,643', '$255,500,000', '$216,100,000', '$1,250,000', '$-20,868,140', '$3,851,000', '$58,985,708', '$-6,984,551', '$7,657,973', '$58,491,516', '$293,281,000', '$278,014,195', '$30,482,317', '$941,214,868', '$267,142,000', '$55,071,636', '$-191,002', '$37,707,417', '$23,794,409', '$52,500,000', '$5,850,377', '$10,300,000', '$-2,407,139']

Checking the number of elements in the 'g_return_on_investment' list.

In [463]:
len(g_return_on_investment)
Out[463]:
34

Getiing the Ratings of all 'G' rated movies.

In [74]:
g_rating = []
for i in g_index:
        g_rating.append(Drama_DataFrame.Averagerating[i])
print(g_rating) #showing the g_rating list
[7.2, 7.6, 7.3, 6.4, 7.3, 7.8, 7.7, 6.9, 8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8, 6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]

Checking the number of elements in the 'g_rating' list.

In [465]:
len(g_rating)
Out[465]:
34

Getiing the Profit Percentage of all 'G' rated movies.

In [386]:
g_percent_profit = []
for i in g_index:
    i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
    g_percent_profit.append(int(round(i,0)))
print(g_percent_profit) #showing the g_percent_profit list
[-99, 244, -88, 166, -52, -99, -41, 267, 2093, 191, 83, 1300, 365, 720, 50, -23, 578, 69, -41, 77, 266, 1629, 3390, 51, 2092, 31135, 324, -64, 377, 372, 404, 334, 606, -80]

Checking the number of elements in the 'g_percent_profit' list.

In [468]:
len(g_percent_profit)
Out[468]:
34

Converting integer of the ROI values to percentage of all 'G' rated movies.

In [76]:
g_roi_percent = []
for i in g_percent_profit:
    g_roi_percent.append("{:}%".format(i))
print(g_roi_percent) #showing the g_roi_percent list
['-99%', '244%', '-88%', '166%', '-52%', '-99%', '-41%', '267%', '2093%', '191%', '83%', '1300%', '365%', '720%', '50%', '-23%', '578%', '69%', '-41%', '77%', '266%', '1629%', '3390%', '51%', '2092%', '31135%', '324%', '-64%', '377%', '372%', '404%', '334%', '606%', '-80%']

Checking the number of elements in the 'g_roi_percent' list.

In [470]:
len(g_roi_percent)
Out[470]:
34

Turning the integer of the star rating of each movie into a star of all 'G' rated movies.

In [77]:
g_stars = []
for i in g_rating:
    g_stars.append('*'*int(i))
print(g_stars) #showing the g_stars list
['*******', '*******', '*******', '******', '*******', '*******', '*******', '******', '********', '******', '******', '*******', '*******', '*********', '*********', '*****', '*******', '******', '*******', '******', '******', '*******', '********', '******', '********', '*******', '*******', '******', '********', '*******', '*******', '*******', '******', '*******']

Checking the number of elements in the 'g_stars' list.

In [472]:
len(g_stars)
Out[472]:
34

Createing the 'G' rated dataframe with the variables previously created.

In [133]:
system_rating_g = pd.DataFrame({"Name of Movie":g_name, "Cost":g_cost, 
                                "Return On Investment":g_return_on_investment, 
                                "ROI Percentage":g_roi_percent,"Ratings":g_stars})

The 'system_rating_g' dataframe. (this dataframe is interactive)

In [387]:
system_rating_g
Out[387]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the index of all the negative values.

In [388]:
neg_values = []
for i,x in enumerate(g_percent_profit): 
    if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[0, 2, 4, 5, 6, 15, 18, 27, 33]

Checking the number of elements in the 'neg_values' list.

In [501]:
len(neg_values)
Out[501]:
9

Dropping the negative values and resetting the index of the system_rating_r dataframe.

In [390]:
system_rating_g = system_rating_g.drop(labels=neg_values, axis=0)
system_rating_g = system_rating_g.reset_index(drop=True)

This is the System_rating_g dataframe. It will be divided into two dataframes.

In [391]:
system_rating_g
Out[391]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_g1 is the first dataframe. (this dataframe is interactive)

In [392]:
system_rating_g1=system_rating_g[:12]
system_rating_g1
Out[392]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_g2 is the second dataframe. (this dataframe is interactive)

In [393]:
system_rating_g2=system_rating_g[12:]
system_rating_g2
Out[393]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the average Budget of all the 'G' rated movies in the Drama genre.

In [479]:
g_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_g['Cost']]) / len(system_rating_g['Cost'])

The average Budget of all the 'G' rated Drama movies is $19,698,960.

In [480]:
g_avg_value
Out[480]:
19698960.0

Getting the index of all the movies that are below the average Return On Investment of all the 'G' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'G' rated Drama mvoies.

In [481]:
g_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_g['Cost']]
#below ayg
g_below_avg5 = []
for i,x in enumerate(g_cost_index):
    if x <= 19698960:g_below_avg5.append(i)
    
g_below_avg6 = []
for i,x in enumerate(g_cost_index):
    if x >= 19698960:g_below_avg6.append(i)

The 'g_below_avg5' list.

In [482]:
print(g_below_avg5)
[0, 1, 5, 6, 9, 10, 12, 14, 15, 18, 19, 20, 21, 22, 23, 24]

The 'g_below_avg6' list.

In [483]:
print(g_below_avg6)
[2, 3, 4, 7, 8, 11, 13, 16, 17]

Getting the average Return On Investment Percentage of all the 'G' rated movies in the Drama genre.

In [489]:
g_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_g['Return On Investment']]) / len(system_rating_g['Cost'])

The average Return On Investment of all the 'G' rated Drama movies is $127,174,411.

In [490]:
g_avg_value
Out[490]:
127174411.52

Getting the index of all the movies that are below the average Return On Investment of all the 'G' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'G' rated Drama mvoies.

In [486]:
g_roi_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_g['Return On Investment']]
#below ayg
g_below_avg7 = []
for i,x in enumerate(g_roi_index):
    if x <= 127174411:g_below_avg7.append(i)
    
g_below_avg8 = []
for i,x in enumerate(g_roi_index):
    if x >= 127174411:g_below_avg8.append(i)

The 'g_below_avg7' list.

In [487]:
print(g_below_avg7)
[0, 1, 2, 4, 5, 6, 9, 10, 11, 12, 13, 16, 19, 20, 21, 22, 23, 24]

The 'g_below_avg8' list.

In [488]:
print(g_below_avg8)
[3, 7, 8, 14, 15, 17, 18]

Getting the average Return On Investment Percentage of all the 'G' rated movies in the Drama genre.

In [491]:
g_avg_value = sum([int(i.replace('%', ''))
                 for i in system_rating_g['ROI Percentage']]) / len(system_rating_g['Cost'])

The average Return On Investment Percentage of all the 'G' rated Drama movies is 1887%.

In [492]:
g_avg_value
Out[492]:
1887.32

Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'G' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'G' rated Drama mvoies.

In [493]:
roi_percent_index_g = [int(i.replace('%', ''))for i in system_rating_g['ROI Percentage']]
#below ayg
g_above_avg = []
for i,x in enumerate(roi_percent_index_g):
    if x >= 1887:g_above_avg.append(i)

The 'g_above_avg' list.

In [494]:
g_above_avg
Out[494]:
[3, 15, 17, 18]

Styling Syetem_rating_g1 using the eight functions and the indexes to do so.

In [517]:
def Ratings21(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(12):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight22(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(12):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:red;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells23(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in g_below_avg5[:6]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells24(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in g_below_avg6[:6]:
        df.iloc[i,1] = 'background-color:red;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells25(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in g_below_avg7[:9]:
        df.iloc[i,2] = 'background-color:red;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells26(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in g_below_avg8[:3]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells27(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[3,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_g1=system_rating_g1.style.apply(Ratings_highlight22, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid red')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","red")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','red')]}#index
                         ])\
            .apply(Ratings21, axis=None)\
            .apply(highlight_cells23, axis=None)\
            .apply(highlight_cells24, axis=None)\
            .apply(highlight_cells25, axis=None)\
            .apply(highlight_cells26, axis=None)\
            .apply(highlight_cells27, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            

The 'Syetem_rating_g1' datarame.

Styling Syetem_rating_g2 using the eight functions and the indexes to do so.

In [518]:
def Ratings28(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(13):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight29(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(13):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:red;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells30(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 2, 3, 6, 7, 8, 9, 10, 11, 12]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells31(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [1,4,5]:
        df.iloc[i,1] = 'background-color:red;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells32(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 1, 4, 7, 8, 9, 10, 11, 12]:
        df.iloc[i,2] = 'background-color:red;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells33(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [2, 3, 5, 6]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells34(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [3, 5, 6]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_g2=system_rating_g2.style.apply(Ratings_highlight29, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid red')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","red")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','red')]}#index
                         ])\
            .apply(Ratings28, axis=None)\
            .apply(highlight_cells30, axis=None)\
            .apply(highlight_cells31, axis=None)\
            .apply(highlight_cells32, axis=None)\
            .apply(highlight_cells33, axis=None)\
            .apply(highlight_cells34, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            

The 'Syetem_rating_g2' datarame.

Saving the System_rating_g1 dataframe to the System_rating_g1.png file as an image to be used for the analysis later on.

In [519]:
dfi.export(system_rating_g1, 'system_rating_g1.png')

Saving the System_rating_g2 dataframe to the System_rating_g2.png file as an image to be used for the analysis later on.

In [511]:
dfi.export(system_rating_g2, 'system_rating_g2.png')

This allows all the two dataframes to be displayed side by side.

In [250]:
def display_side_by_side2(*args):
    html_str = "<center><font size=6 style='color:red'>The Return On Investement on G-rated Movies.</font></center> <br>  " 
 
    for df in args:
        html_str += df.to_html()
    display_html(
        html_str.replace('table','table style="display:inline"'), 
        raw=True
    )

Below will be the creation of dataframes that are in the 'Drama Genre' that are 'PG-rated'.

Index of all the 'PG' rated movies.

In [394]:
pg_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
    if x == 'PG':pg_index.append(i)
print(pg_index) #showing the pg_index list
[0, 31, 40, 61, 62, 129, 141, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213]

Checking the number of elements in the 'pg_index' list.

In [251]:
len(pg_index)
Out[251]:
67

Getiing the Cost for all 'PG' rated movies.

In [86]:
pg_cost = []
for i in pg_index:
        pg_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(pg_cost) #showing the pg_index list
['$180,000,000', '$37,000,000', '$31,000,000', '$20,000,000', '$20,000,000', '$3,000,000', '$1,700,000', '$5,100,000', '$10,000,000', '$95,000,000', '$3,000,000', '$20,000,000', '$40,000,000', '$5,000,000', '$422,000', '$5,100,000', '$72,000,000', '$11,800,000', '$15,000,000', '$32,000,000', '$40,000,000', '$65,000,000', '$8,000,000', '$9,000,000', '$17,000,000', '$30,000,000', '$500,000', '$20,000,000', '$11,000,000', '$2,000,000', '$23,000,000', '$45,000,000', '$15,000,000', '$10,000,000', '$32,000,000', '$90,000,000', '$10,000,000', '$27,000,000', '$16,000,000', '$3,000,000', '$15,000,000', '$25,000,000', '$34,000,000', '$10,000,000', '$20,000,000', '$15,000,000', '$12,000,000', '$5,000,000', '$7,000,000', '$14,000,000', '$15,000,000', '$12,000,000', '$28,300,000', '$8,000,000', '$7,500,000', '$17,000,000', '$5,000,000', '$9,000,000', '$15,000,000', '$22,000,000', '$5,000,000', '$4,500,000', '$4,500,000', '$8,000,000', '$16,000,000', '$8,200,000', '$28,000,000']

Checking the number of elements in the 'pg_cost' list.

In [522]:
len(pg_cost)
Out[522]:
67

Getiing the Name for all 'PG' rated movies.

In [87]:
pg_name = []
for i in pg_index:
        pg_name.append(Drama_DataFrame.Movie[i])
print(pg_name) #showing the pg_name list
['Hugo', 'Dolphin Tale', 'Extraordinary Measures', 'Wonder', 'The Last Song', 'War Room', 'The Lunchbox', 'Somewhere in Time', 'Urban Cowboy', 'Cinderella', 'War Room', 'Wonder', 'Little Women', 'Overcomer', 'The Jazz Singer', 'Cattle Annie and Little Britches', 'The Majestic', 'A Walk to Remember', 'Tuck Everlasting', 'Dreamer', 'The Lake House', 'We Are Marshall', 'Akeelah and the Bee', 'The Ultimate Gift', 'Bridge to Terabithia', 'August Rush', 'Fireproof', 'The Last Song', 'What If...', "God's Not Dead", "Mr. Holland's Opus", 'The Indian in the Cupboard', 'Fluke', 'Three Wishes', 'Phenomenon', 'Contact', 'The Spanish Prisoner', 'Music of the Heart', 'Sense and Sensibility', 'The Secret of Roan Inish', 'The Remains of the Day', 'Gettysburg', 'The Age of Innocence', 'Pure Country', 'Forever Young', 'Newsies', 'A River Runs Through It', 'Honeysuckle Rose', 'Resurrection', 'Taps', 'On Golden Pond', 'Absence of Malice', 'Ragtime', 'Looker', 'The Night the Lights Went Out in Georgia', 'Rocky III', 'Tex', 'Six Weeks', 'Five Days One Summer', 'Staying Alive', 'Eddie and the Cruisers', 'Tender Mercies', 'Testament', 'Table for Five', 'Man, Woman and Child', 'Footloose', 'The Natural']

Checking the number of elements in the 'pg_name' list.

In [524]:
len(pg_name)
Out[524]:
67

Getiing the ROI for all 'PG' rated movies.

In [88]:
pg_return_on_investment = []
for i in pg_index:
        pg_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(pg_return_on_investment) #showing the pg_return_on_investment list
['$47,784', '$59,068,724', '$-15,173,016', '$284,604,712', '$72,678,948', '$70,975,239', '$10,531,500', '$4,609,597', '$36,918,287', '$447,351,353', '$70,986,904', '$285,937,718', '$176,601,214', '$33,102,988', '$26,696,000', '$-4,565,184', '$-34,693,666', '$35,694,916', '$4,344,615', '$6,741,732', '$74,830,111', '$-21,454,636', '$10,948,425', '$-5,561,265', '$120,587,063', '$34,605,762', '$32,973,297', '$69,137,047', '$-2,473,712', '$62,667,874', '$83,269,971', '$-9,343,870', '$-11,012,232', '$-2,974,504', '$120,036,382', '$81,120,329', '$3,835,130', '$-12,140,606', '$118,582,776', '$3,101,815', '$48,954,968', '$-14,230,040', '$-1,744,560', '$5,164,458', '$107,956,187', '$-12,180,515', '$31,440,294', '$12,815,212', '$150,297,525', '$21,856,053', '$104,285,432', '$28,716,963', '$-13,379,219', '$-4,718,768', '$7,423,752', '$108,052,686', '$544,368,315', '$-2,331,975', '$-14,800,922', '$42,892,670', '$-213,211', '$3,943,124', '$-2,455,108', '$-5,600,000', '$-14,294,092', '$71,808,942', '$20,000,000']

Checking the number of elements in the 'pg_return_on_investment' list.

In [526]:
len(pg_return_on_investment)
Out[526]:
67

Getiing the Ratings of all 'PG' rated movies.

In [89]:
pg_rating = []
for i in pg_index:
        pg_rating.append(Drama_DataFrame.Averagerating[i])
print(pg_rating) #showing the pg_rating list
[7.5, 6.9, 6.5, 8.0, 6.0, 6.5, 7.8, 7.2, 6.4, 6.9, 6.5, 8.0, 7.8, 6.6, 5.9, 6.1, 6.9, 7.3, 6.6, 6.8, 6.8, 7.1, 7.4, 7.3, 7.1, 7.5, 6.5, 6.0, 6.4, 4.7, 7.3, 6.0, 6.7, 6.1, 6.4, 7.5, 7.2, 6.8, 7.7, 7.5, 7.8, 7.6, 7.2, 7.0, 6.3, 6.9, 7.2, 6.3, 7.3, 6.8, 7.6, 6.9, 7.3, 6.1, 6.0, 6.8, 6.5, 5.7, 6.1, 4.7, 6.9, 7.4, 7.0, 6.1, 6.1, 6.6, 7.5]

Checking the number of elements in the 'pg_rating' list.

In [528]:
len(pg_rating)
Out[528]:
67

Getiing the Profit Percentage of all 'PG' rated movies.

In [395]:
pg_percent_profit = []
for i in pg_index:
    i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
    pg_percent_profit.append(int(round(i,0)))
print(pg_percent_profit) #showing the pg_percent_profit list
[0, 160, -49, 1423, 363, 2366, 620, 90, 369, 471, 2366, 1430, 442, 662, 6326, -90, -48, 302, 29, 21, 187, -33, 137, -62, 709, 115, 6595, 346, -22, 3133, 362, -21, -73, -30, 375, 90, 38, -45, 741, 103, 326, -57, -5, 52, 540, -81, 262, 256, 2147, 156, 695, 239, -47, -59, 99, 636, 10887, -26, -99, 195, -4, 88, -55, -70, -89, 876, 71]

Checking the number of elements in the 'pg_percent_profit' list.

In [256]:
len(pg_percent_profit)
Out[256]:
67

Converting integer of the ROI values to percentage of all 'PG' rated movies.

In [91]:
pg_roi_percent = []
for i in pg_percent_profit:
    pg_roi_percent.append("{:}%".format(i))
print(pg_roi_percent) #showing the pg_roi_percent list
['0%', '160%', '-49%', '1423%', '363%', '2366%', '620%', '90%', '369%', '471%', '2366%', '1430%', '442%', '662%', '6326%', '-90%', '-48%', '302%', '29%', '21%', '187%', '-33%', '137%', '-62%', '709%', '115%', '6595%', '346%', '-22%', '3133%', '362%', '-21%', '-73%', '-30%', '375%', '90%', '38%', '-45%', '741%', '103%', '326%', '-57%', '-5%', '52%', '540%', '-81%', '262%', '256%', '2147%', '156%', '695%', '239%', '-47%', '-59%', '99%', '636%', '10887%', '-26%', '-99%', '195%', '-4%', '88%', '-55%', '-70%', '-89%', '876%', '71%']

Checking the number of elements in the 'pg_roi_percent' list.

In [531]:
len(pg_roi_percent)
Out[531]:
67

Turning the integer of the star rating of each movie into a star of all 'PG' rated movies.

In [92]:
pg_stars = []
for i in pg_rating:
    pg_stars.append('*'*int(i))
print(pg_stars) #showing the pg_stars list
['*******', '******', '******', '********', '******', '******', '*******', '*******', '******', '******', '******', '********', '*******', '******', '*****', '******', '******', '*******', '******', '******', '******', '*******', '*******', '*******', '*******', '*******', '******', '******', '******', '****', '*******', '******', '******', '******', '******', '*******', '*******', '******', '*******', '*******', '*******', '*******', '*******', '*******', '******', '******', '*******', '******', '*******', '******', '*******', '******', '*******', '******', '******', '******', '******', '*****', '******', '****', '******', '*******', '*******', '******', '******', '******', '*******']

Checking the number of elements in the 'pg_stars' list.

In [533]:
len(pg_stars)
Out[533]:
67

Createing the 'PG' rated dataframe with the variables previously created.

In [143]:
system_rating_pg = pd.DataFrame({"Name of Movie":pg_name, "Cost":pg_cost, 
                                "Return On Investment":pg_return_on_investment, 
                                "ROI Percentage":pg_roi_percent,"Ratings":pg_stars})

The 'system_rating_pg' dataframe. (this dataframe is interactive)

In [396]:
system_rating_pg
Out[396]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the index of all the negative values.

In [397]:
neg_values = []
for i,x in enumerate(pg_percent_profit): 
    if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[0, 2, 15, 16, 21, 23, 28, 31, 32, 33, 37, 41, 42, 45, 52, 53, 57, 58, 60, 62, 63, 64]

Checking the number of elements in the 'neg_values' list.

In [141]:
len(neg_values)
Out[141]:
22

Dropping the negative values and resetting the index of the system_rating_pg dataframe.

In [398]:
system_rating_pg = system_rating_pg.drop(labels=neg_values, axis=0)
system_rating_pg = system_rating_pg.reset_index(drop=True)

The new 'system_rating_pg' dataframe. It will be divided into two dataframes. (this dataframe is interactive)

In [399]:
system_rating_pg
Out[399]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_pg1 is the first dataframe. (this dataframe is interactive)

In [400]:
system_rating_pg1=system_rating_pg[:22]
system_rating_pg1
Out[400]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_pg2 is the first dataframe. (this dataframe is interactive)

In [401]:
system_rating_pg2=system_rating_pg[22:]
system_rating_pg2
Out[401]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the average Budget of all the 'PG' rated movies in the Drama genre.

In [541]:
pg_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_pg['Cost']]) / len(system_rating_pg['Cost'])

The average Budget of all the 'PG' rated Drama movies is $18,060,488.

In [543]:
pg_avg_value
Out[543]:
18060488.888888888

Getting the index of all the movies that are below the average Return On Investment of all the 'PG' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG' rated Drama mvoies.

In [544]:
pg_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_pg['Cost']]
#below ayg
pg_below_avg5 = []
for i,x in enumerate(pg_cost_index):
    if x <= 18060488:pg_below_avg5.append(i)
    
pg_below_avg6 = []
for i,x in enumerate(pg_cost_index):
    if x >= 18060488:pg_below_avg6.append(i)

The 'pg_below_avg5' list.

In [545]:
print(pg_below_avg5)
[3, 4, 5, 6, 8, 11, 12, 13, 14, 17, 18, 20, 22, 26, 27, 28, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43]

The 'pg_below_avg6' list.

In [546]:
print(pg_below_avg6)
[0, 1, 2, 7, 9, 10, 15, 16, 19, 21, 23, 24, 25, 31, 41, 44]

Getting the average Return On Investment Percentage of all the 'PG' rated movies in the Drama genre.

In [547]:
pg_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_pg['Return On Investment']]) / len(system_rating_pg['Cost'])

The average Return On Investment of all the 'PG' rated Drama movies is $83,389,266.

In [548]:
pg_avg_value
Out[548]:
83389266.8888889

Getting the index of all the movies that are below the average Return On Investment of all the 'PG' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG' rated Drama mvoies.

In [549]:
pg_roi_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_pg['Return On Investment']]
#below ayg
pg_below_avg7 = []
for i,x in enumerate(pg_roi_index):
    if x <= 83389266:pg_below_avg7.append(i)
    
pg_below_avg8 = []
for i,x in enumerate(pg_roi_index):
    if x >= 83389266:pg_below_avg8.append(i)

The 'pg_below_avg7' list.

In [550]:
print(pg_below_avg7)
[0, 2, 3, 4, 5, 6, 8, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 32, 33, 35, 37, 38, 41, 42, 43, 44]

The 'pg_below_avg8' list.

In [551]:
print(pg_below_avg8)
[1, 7, 9, 10, 18, 24, 27, 31, 34, 36, 39, 40]

Getting the average Return On Investment Percentage of all the 'PG' rated movies in the Drama genre.

In [552]:
pg_avg_value = sum([int(i.replace('%', ''))
                 for i in system_rating_pg['ROI Percentage']]) / len(system_rating_pg['Cost'])

The average Return On Investment Percentage of all the 'PG' rated Drama movies is 1064%.

In [553]:
pg_avg_value
Out[553]:
1064.3555555555556

Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'PG' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'PG' rated Drama mvoies.

In [554]:
roi_percent_index_pg = [int(i.replace('%', ''))for i in system_rating_pg['ROI Percentage']]
#below ayg
pg_above_avg = []
for i,x in enumerate(roi_percent_index_pg):
    if x >= 1064:pg_above_avg.append(i)

The 'pg_below_avg' list.

In [555]:
print(pg_above_avg)
[1, 3, 8, 9, 12, 20, 22, 34, 40]

Styling Syetem_rating_pg1 using the eight functions and the indexes to do so.

In [556]:
def Ratings35(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight36(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#FA5F55;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells37(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg_below_avg5[:12]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells38(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg_below_avg6[:10]:
        df.iloc[i,1] = 'background-color:#FA5F55;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells39(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg_below_avg7[:17]:
        df.iloc[i,2] = 'background-color:#FA5F55;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells40(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg_below_avg8[:5]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells41(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg_above_avg[:6]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_pg1=system_rating_pg1.style.apply(Ratings_highlight36, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #FA5F55')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#FA5F55")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#FA5F55')]}#index
                         ])\
            .apply(Ratings35, axis=None)\
            .apply(highlight_cells37, axis=None)\
            .apply(highlight_cells38, axis=None)\
            .apply(highlight_cells39, axis=None)\
            .apply(highlight_cells40, axis=None)\
            .apply(highlight_cells41, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            
            

The 'Syetem_rating_pg1' datarame.

Styling Syetem_rating_pg2 using the eight functions and the indexes to do so.

In [557]:
def Ratings41(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(23):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight42(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(23):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#FA5F55;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells43(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [1, 2, 3, 9, 19, 22]:
        df.iloc[i,1] = 'background-color:#FA5F55;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells44(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells45(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 3, 4, 6, 7, 8, 10, 11, 13, 15, 16, 18, 19, 20, 21, 22]:
        df.iloc[i,2] = 'background-color:#FA5F55;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells46(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [1, 2, 5, 9, 12, 14, 17]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells47(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 12, 18]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_pg2 = system_rating_pg2.style.apply(Ratings_highlight42, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #FA5F55')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#FA5F55")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#FA5F55')]}#index
                         ])\
            .apply(Ratings41, axis=None)\
            .apply(highlight_cells43, axis=None)\
            .apply(highlight_cells44, axis=None)\
            .apply(highlight_cells45, axis=None)\
            .apply(highlight_cells46, axis=None)\
            .apply(highlight_cells47, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            
            

The 'Syetem_rating_pg2' datarame.

Saving the System_rating_pg1 dataframe to the System_rating_pg1.png file as an image to be used for the analysis later on.

In [559]:
dfi.export(system_rating_pg1, 'system_rating_pg1.png')

Saving the System_rating_pg2 dataframe to the System_rating_pg2.png file as an image to be used for the analysis later on.

In [560]:
dfi.export(system_rating_pg2, 'system_rating_pg2.png')

This allows all the two dataframes to be displayed side by side.

In [275]:
def display_side_by_side3(*args):
    html_str = "<center><font size=6 style='color:#FA5F55'>The Return On Investement on PG-rated Movies.</font></center> <br>  " 
 
    for df in args:
        html_str += df.to_html()
    display_html(
        html_str.replace('table','table style="display:inline"'), 
        raw=True
    )

Below will be the creation of dataframes that are in the 'Drama Genre' that are 'PG-13 rated' based on the 'ROI' of each movie.

Index of all the 'PG-13' rated movies.

In [402]:
pg13_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
    if x == 'PG-13':pg13_index.append(i)
print(pg13_index) #showing the pg13_index list
[2, 4, 7, 8, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 60, 63, 65, 68, 69, 70, 72, 73, 74, 75, 78, 79, 80, 83, 86, 89, 91, 95, 96, 99, 100, 102, 104, 105, 108, 109, 113, 114, 115, 117, 119, 122, 123, 131, 132, 143, 149, 151]

Checking the number of elements in the 'pg13_index' list.

In [562]:
len(pg13_index)
Out[562]:
76

Getiing the Profit for all 'PG-13' rated movies.

In [100]:
pg13_profit = []
for i in pg13_index:
        pg13_profit.append(Drama_DataFrame.Profit[i])
print(pg13_profit) #showing the pg13_profit list
[583698673.0, 559454789.0, 77551594.0, -12181087.0, 35552675.0, 163591522.0, 129748880.0, 58660270.0, -8357834.0, -23612961.0, 22004627.0, 156127894.0, 4478084.0, 122498338.0, 129590606.0, -23659233.0, -8875633.0, 78809717.0, 136567581.0, 60143987.0, 49309093.0, 217276928.0, 26721826.0, 29802928.0, 132552290.0, 167618160.0, 38984536.0, -13518595.0, 66050951.0, -11684491.0, 15059418.0, 188120004.0, 117033509.0, 71633833.0, 41540205.0, 4847480.0, 57917283.0, 40282881.0, -15953962.0, 188265198.0, 2281732.0, 57086711.0, -3810190.0, -10319750.0, 317522294.0, 21028230.0, 36545707.0, 40506120.0, 113955898.0, 5601987.0, 44168692.0, 20044909.0, 20069303.0, 20909437.0, 11477345.0, 67356170.0, 51076141.0, 51603136.0, 21556959.0, 27087044.0, 72831866.0, 12971021.0, 23787727.0, 29964656.0, 10369708.0, 143806510.0, 36699612.0, 13945682.0, 1205034.0, -3472240.0, -3157373.0, 12698355.0, 33185884.0, 4152584.0, 3478400.0, 1927779.0]

Checking the number of elements in the 'pg13_profit' list.

In [277]:
len(pg13_profit)
Out[277]:
76

Getiing the Cost for all 'PG-13' rated movies.

In [101]:
pg13_cost = []
for i in pg13_index:
        pg13_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(pg13_cost) #showing the pg13_cost list
['$110,000,000', '$75,000,000', '$60,000,000', '$60,000,000', '$55,000,000', '$50,000,000', '$50,000,000', '$50,000,000', '$50,000,000', '$50,000,000', '$49,000,000', '$47,000,000', '$44,000,000', '$40,000,000', '$40,000,000', '$40,000,000', '$40,000,000', '$38,000,000', '$37,000,000', '$37,000,000', '$36,000,000', '$35,000,000', '$35,000,000', '$34,000,000', '$33,000,000', '$30,000,000', '$30,000,000', '$30,000,000', '$28,000,000', '$27,500,000', '$26,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$24,000,000', '$21,000,000', '$20,000,000', '$20,000,000', '$19,000,000', '$18,000,000', '$18,000,000', '$17,000,000', '$17,000,000', '$16,000,000', '$16,000,000', '$15,000,000', '$15,000,000', '$15,000,000', '$14,000,000', '$13,000,000', '$12,000,000', '$12,000,000', '$11,000,000', '$11,000,000', '$10,000,000', '$10,000,000', '$9,700,000', '$9,000,000', '$9,000,000', '$7,400,000', '$7,000,000', '$6,000,000', '$5,000,000', '$5,000,000', '$5,000,000', '$5,000,000', '$4,500,000', '$4,357,373', '$2,600,000', '$2,000,000', '$1,400,000', '$250,000', '$175,000']

Checking the number of elements in the 'pg13_cost' list.

In [278]:
len(pg13_cost)
Out[278]:
76

Getiing the Name for all 'PG-13' rated movies.

In [102]:
pg13_name = []
for i in pg13_index:
        pg13_name.append(Drama_DataFrame.Movie[i])
print(pg13_name) #showing the pg13_name list
['Gravity', 'Sing', 'Contagion', 'Trouble with the Curve', 'Burlesque', 'Creed II', 'The Post', 'Hereafter', 'Dream House', 'Upside Down', 'Anna Karenina', 'Arrival', 'Charlie St. Cloud', 'Bridge of Spies', 'The Impossible', 'Paranoia', 'Victor Frankenstein', 'Water for Elephants', 'Creed', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Tree of Life', 'The Longest Ride', 'Step Up Revolution', 'The Vow', 'The Age of Adaline', 'The Space Between Us', 'Safe Haven', 'Anonymous', 'The Best of Me', 'The Help', 'Dear John', 'The Lucky One', 'The Giver', 'Draft Day', 'Rings', 'Fences', 'The Beaver', 'Me Before You', 'The Light Between Oceans', 'The Book Thief', 'Labor Day', 'Midnight Special', 'A Quiet Place', 'Beastly', 'The Roommate', 'Remember Me', 'The Woman in Black', 'Country Strong', 'One Day', 'Suffragette', 'The Perks of Being a Wallflower', 'Project Almanac', 'Wish Upon', 'If I Stay', 'Brooklyn', 'Everything, Everything', 'Mud', 'Amour', 'Ouija: Origin of Evil', 'Black or White', 'The Bye Bye Man', 'Gifted', 'The Words', 'Lights Out', 'Still Alice', 'Before I Fall', 'Rabbit Hole', 'Maggie', 'Anna', 'Ida', 'Courageous', 'Mustang', 'Like Crazy', 'Another Earth']

Checking the number of elements in the 'pg13_name' list.

In [279]:
len(pg13_name)
Out[279]:
76

Getiing the ROI for all 'PG-13' rated movies.

In [103]:
pg13_return_on_investment = []
for i in pg13_index:
        pg13_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(pg13_return_on_investment) #showing the pg13_return_on_investment list
['$583,698,673', '$559,454,789', '$77,551,594', '$-12,181,087', '$35,552,675', '$163,591,522', '$129,748,880', '$58,660,270', '$-8,357,834', '$-23,612,961', '$22,004,627', '$156,127,894', '$4,478,084', '$122,498,338', '$129,590,606', '$-23,659,233', '$-8,875,633', '$78,809,717', '$136,567,581', '$60,143,987', '$49,309,093', '$217,276,928', '$26,721,826', '$29,802,928', '$132,552,290', '$167,618,160', '$38,984,536', '$-13,518,595', '$66,050,951', '$-11,684,491', '$15,059,418', '$188,120,004', '$117,033,509', '$71,633,833', '$41,540,205', '$4,847,480', '$57,917,283', '$40,282,881', '$-15,953,962', '$188,265,198', '$2,281,732', '$57,086,711', '$-3,810,190', '$-10,319,750', '$317,522,294', '$21,028,230', '$36,545,707', '$40,506,120', '$113,955,898', '$5,601,987', '$44,168,692', '$20,044,909', '$20,069,303', '$20,909,437', '$11,477,345', '$67,356,170', '$51,076,141', '$51,603,136', '$21,556,959', '$27,087,044', '$72,831,866', '$12,971,021', '$23,787,727', '$29,964,656', '$10,369,708', '$143,806,510', '$36,699,612', '$13,945,682', '$1,205,034', '$-3,472,240', '$-3,157,373', '$12,698,355', '$33,185,884', '$4,152,584', '$3,478,400', '$1,927,779']

Checking the number of elements in the 'pg13_return_on_investment' list.

In [567]:
len(pg13_return_on_investment)
Out[567]:
76

Getiing the Ratings of all 'PG-13' rated movies.

In [104]:
pg13_rating = []
for i in pg13_index:
        pg13_rating.append(Drama_DataFrame.Averagerating[i])
print(pg13_rating) #showing the pg13_rating list
[7.7, 7.1, 6.6, 6.8, 6.4, 7.2, 7.2, 6.5, 6.0, 6.6, 6.6, 7.9, 6.5, 7.6, 7.6, 5.7, 6.0, 6.9, 5.8, 6.0, 6.8, 7.6, 6.8, 7.1, 6.5, 6.8, 7.2, 6.4, 6.7, 6.9, 6.7, 8.1, 6.3, 6.5, 6.5, 6.8, 4.5, 7.2, 6.7, 7.4, 7.2, 7.6, 6.9, 6.6, 6.6, 5.6, 4.9, 7.1, 6.4, 6.3, 7.0, 6.9, 8.0, 6.4, 5.0, 6.8, 7.5, 6.4, 7.4, 7.9, 6.1, 6.6, 4.3, 5.6, 7.1, 6.4, 7.5, 6.4, 7.0, 6.4, 6.5, 7.4, 7.0, 7.6, 6.7, 7.0]

Checking the number of elements in the 'pg13_rating' list.

In [281]:
len(pg13_rating)
Out[281]:
76

Getiing the Profit Percentage of all 'PG-13' rated movies.

In [403]:
pg13_percent_profit = []
for i in pg13_index:
    i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
    pg13_percent_profit.append(int(round(i,0)))
print(pg13_percent_profit) #showing the pg13_percent_profit list
[531, 746, 129, -20, 65, 327, 259, 117, -17, -47, 45, 332, 10, 306, 324, -59, -22, 207, 369, 163, 137, 621, 76, 88, 402, 559, 130, -45, 236, -42, 58, 752, 468, 287, 166, 19, 232, 168, -76, 941, 11, 300, -21, -57, 1868, 124, 228, 253, 760, 37, 294, 143, 154, 174, 96, 612, 464, 516, 216, 279, 809, 144, 321, 428, 173, 2876, 734, 279, 24, -77, -72, 488, 1659, 297, 1391, 1102]

Checking the number of elements in the 'pg13_percent_profit' list.

In [570]:
len(pg13_percent_profit)
Out[570]:
76

Converting integer of the ROI values to percentage of all 'PG-13' rated movies.

In [106]:
pg13_roi_percent = []
for i in pg13_percent_profit:
    pg13_roi_percent.append("{:}%".format(i))
print(pg13_roi_percent) #showing the pg13_roi_percent list
['531%', '746%', '129%', '-20%', '65%', '327%', '259%', '117%', '-17%', '-47%', '45%', '332%', '10%', '306%', '324%', '-59%', '-22%', '207%', '369%', '163%', '137%', '621%', '76%', '88%', '402%', '559%', '130%', '-45%', '236%', '-42%', '58%', '752%', '468%', '287%', '166%', '19%', '232%', '168%', '-76%', '941%', '11%', '300%', '-21%', '-57%', '1868%', '124%', '228%', '253%', '760%', '37%', '294%', '143%', '154%', '174%', '96%', '612%', '464%', '516%', '216%', '279%', '809%', '144%', '321%', '428%', '173%', '2876%', '734%', '279%', '24%', '-77%', '-72%', '488%', '1659%', '297%', '1391%', '1102%']

Checking the number of elements in the 'pg13_roi_percent' list.

In [572]:
len(pg13_roi_percent)
Out[572]:
76

Turning the integer of the star rating of each movie into a star of all 'PG-13' rated movies.

In [107]:
pg13_stars = []
for i in pg13_rating:
    pg13_stars.append('*'*int(i))
print(pg13_stars) #showing the pg13_stars list
['*******', '*******', '******', '******', '******', '*******', '*******', '******', '******', '******', '******', '*******', '******', '*******', '*******', '*****', '******', '******', '*****', '******', '******', '*******', '******', '*******', '******', '******', '*******', '******', '******', '******', '******', '********', '******', '******', '******', '******', '****', '*******', '******', '*******', '*******', '*******', '******', '******', '******', '*****', '****', '*******', '******', '******', '*******', '******', '********', '******', '*****', '******', '*******', '******', '*******', '*******', '******', '******', '****', '*****', '*******', '******', '*******', '******', '*******', '******', '******', '*******', '*******', '*******', '******', '*******']

Checking the number of elements in the 'pg13_stars' list.

In [284]:
len(pg13_stars)
Out[284]:
76

Createing the 'PG-13' rated dataframe with the variables previously created.

In [131]:
system_rating_pg13 = pd.DataFrame({"Name of Movie":pg13_name, "Cost":pg13_cost, 
                                "Return On Investment":pg13_return_on_investment, 
                                "ROI Percentage":pg13_roi_percent,"Ratings":pg13_stars})

The 'system_rating_pg13' dataframe. (this dataframe is interactive)

In [404]:
system_rating_pg13
Out[404]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the index of all the negative values.

In [405]:
neg_values = []
for i,x in enumerate(pg13_percent_profit): 
    if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[3, 8, 9, 15, 16, 27, 29, 38, 42, 43, 69, 70]

Checking the number of elements in the 'neg_values' list.

In [576]:
len(neg_values)
Out[576]:
12

Dropping the negative values and resetting the index of the system_rating_pg dataframe.

In [406]:
system_rating_pg13= system_rating_pg13.drop(labels=neg_values, axis=0)
system_rating_pg13 = system_rating_pg13.reset_index(drop=True)

The new 'system_rating_pg13' dataframe. It will be divided into three dataframes. (this dataframe is interactive)

In [407]:
system_rating_pg13
Out[407]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_pg131 is the first dataframe. (this dataframe is interactive)

In [408]:
system_rating_pg131=system_rating_pg13[:22]
system_rating_pg131
Out[408]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_pg132 is the first dataframe. (this dataframe is interactive)

In [409]:
system_rating_pg132=system_rating_pg13[22:42]
system_rating_pg132
Out[409]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_pg133 is the first dataframe. (this dataframe is interactive)

In [410]:
system_rating_pg133=system_rating_pg13[42:]
system_rating_pg133
Out[410]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the average Budget of all the 'PG-13' rated movies in the Drama genre.

In [581]:
pg13_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_pg13['Cost']]) / len(system_rating_pg13['Cost'])

The average Budget of all the 'PG-13' rated Drama movies is $24,695,703.

In [583]:
pg13_avg_value
Out[583]:
24695703.125

Getting the index of all the movies that are below the average Return On Investment of all the 'PG-13' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG-13' rated Drama mvoies.

In [584]:
pg13_cost_index = [int(i.replace('$', '').replace(',', ''))
                   for i in system_rating_pg13['Cost']]
#below ayg
pg13_below_avg1 = []
for i,x in enumerate(pg13_cost_index):
    if x <= 24695703:pg13_below_avg1.append(i)
    
pg13_below_avg2 = []
for i,x in enumerate(pg13_cost_index):
    if x >= 24695703:pg13_below_avg2.append(i)

The 'pg13_below_avg1' list.

In [585]:
print(pg13_below_avg1)
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]

The 'pg13_below_avg2' list.

In [586]:
print(pg13_below_avg2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

Getting the average Return On Investment Percentage of all the 'PG-13' rated movies in the Drama genre.

In [587]:
pg13_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                for i in system_rating_pg13['Return On Investment']]) / len(system_rating_pg13['Cost'])

The average Return On Investment of all the 'PG-13' rated Drama movies is $79,724,974.

In [588]:
pg13_avg_value
Out[588]:
79724974.890625

Getting the index of all the movies that are below the average Return On Investment of all the 'PG-13' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG-13' rated Drama mvoies.

In [589]:
pg13_roi_index = [int(i.replace('$', '').replace(',', ''))
                  for i in system_rating_pg13['Return On Investment']]
#below ayg
pg13_below_avg3 = []
for i,x in enumerate(pg13_roi_index):
    if x <= 79724974:pg13_below_avg3.append(i)
    
pg13_below_avg4 = []
for i,x in enumerate(pg13_roi_index):
    if x >= 79724974:pg13_below_avg4.append(i)

The 'pg13_below_avg3' list.

In [591]:
print(pg13_below_avg3)
[2, 3, 6, 7, 9, 12, 14, 15, 17, 18, 21, 22, 23, 26, 27, 28, 29, 30, 32, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63]

The 'pg13_below_avg4' list.

In [590]:
print(pg13_below_avg4)
[0, 1, 4, 5, 8, 10, 11, 13, 16, 19, 20, 24, 25, 31, 34, 38, 55]

Getting the average Return On Investment Percentage of all the 'PG-13' rated movies in the Drama genre.

In [592]:
pg13_avg_value = sum([int(i.replace('%', ''))
                 for i in system_rating_pg13['ROI Percentage']]) / len(system_rating_pg13['Cost'])

The average Return On Investment Percentage of all the 'PG-13' rated Drama movies is 414%.

In [593]:
pg13_avg_value
Out[593]:
414.4375

Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'PG-13' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'PG-13' rated Drama mvoies.

In [596]:
roi_percent_index_pg13 = [int(i.replace('%', ''))
                          for i in system_rating_pg13['ROI Percentage']]
#below ayg
pg13_above_avg = []
for i,x in enumerate(roi_percent_index_pg13):
    if x >= 414:pg13_above_avg.append(i)

The 'pg13_above_avg' list.

In [597]:
print(pg13_above_avg)
[0, 1, 16, 20, 24, 25, 31, 34, 38, 45, 46, 47, 50, 53, 55, 56, 59, 60, 62, 63]

Styling Syetem_rating_pg131 using the eight functions and the indexes to do so.

In [606]:
def Ratings48(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight49(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#DE3163;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells50(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg13_below_avg1:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells51(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg13_below_avg2[:22]:
        df.iloc[i,1] = 'background-color:#DE3163;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells52(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg13_below_avg3[:11]:
        df.iloc[i,2] = 'background-color:#DE3163;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells53(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg13_below_avg4[:11]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells54(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg13_above_avg[:4]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_pg131=system_rating_pg131.style.apply(Ratings_highlight49, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #DE3163')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#DE3163")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#DE3163')]}#index
                         ])\
            .apply(Ratings48, axis=None)\
            .apply(highlight_cells51, axis=None)\
            .apply(highlight_cells52, axis=None)\
            .apply(highlight_cells53, axis=None)\
            .apply(highlight_cells54, axis=None)\
            #.apply(highlight_cells50, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            
            

The 'Syetem_rating_pg131' datarame.

Styling Syetem_rating_pg132 using the eight functions and the indexes to do so.

In [607]:
def Ratings48(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(20):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight49(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(20):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#DE3163;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells50(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(8,20):
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells51(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(8):
        df.iloc[i,1] = 'background-color:#DE3163;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells52(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 1, 4, 5, 6, 7, 8, 10, 11, 13, 14, 15, 17, 18, 19]:
        df.iloc[i,2] = 'background-color:#DE3163;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells53(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [2, 3, 9, 16, 12]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells54(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [2, 3, 9, 16, 12]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_pg132=system_rating_pg132.style.apply(Ratings_highlight49, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #DE3163')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#DE3163")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#DE3163')]}#index
                         ])\
            .apply(Ratings48, axis=None)\
            .apply(highlight_cells51, axis=None)\
            .apply(highlight_cells52, axis=None)\
            .apply(highlight_cells53, axis=None)\
            .apply(highlight_cells54, axis=None)\
            .apply(highlight_cells50, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            
            

The 'Syetem_rating_pg132' datarame.

Styling Syetem_rating_pg133 using the eight functions and the indexes to do so.

In [608]:
def Ratings48(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight49(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#DE3163;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
        df.iloc[i,1] = 'font-size:4pt'
    return df 


def highlight_cells50(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(22):
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
    return df 

def highlight_cells51(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in pg13_below_avg2[:22]:
        df.iloc[i,1] = 'background-color:#DE3163;color:white;border-bottom:2px solid black'
    return df 

def highlight_cells52(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]:
        df.iloc[i,2] = 'background-color:#DE3163;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells53(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[11,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells54(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [3, 4, 5, 8, 11, 13, 14, 17, 18, 20, 21 ]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 


def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_pg133 = system_rating_pg133.style.apply(Ratings_highlight49, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #DE3163')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#DE3163")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#DE3163')]}#index
                         ])\
            .apply(Ratings48, axis=None)\
            .apply(highlight_cells50, axis=None)\
            .apply(highlight_cells52, axis=None)\
            .apply(highlight_cells53, axis=None)\
            .apply(highlight_cells54, axis=None)\
           
            #.apply(highlight_cells51, axis=None)\
            
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')
            
            

The 'Syetem_rating_pg133' datarame.

Saving the System_rating_pg131 dataframe to the System_rating_pg131.png file as an image to be used for the analysis later on.

In [609]:
dfi.export(system_rating_pg131, 'system_rating_pg131.png')

Saving the System_rating_pg132 dataframe to the System_rating_pg132.png file as an image to be used for the analysis later on.

In [610]:
dfi.export(system_rating_pg132, 'system_rating_pg132.png')

Saving the System_rating_pg133 dataframe to the System_rating_pg133.png file as an image to be used for the analysis later on.

In [611]:
dfi.export(system_rating_pg133, 'system_rating_pg133.png')

This allows all the three dataframes to be displayed side by side.

In [303]:
def display_side_by_side4(*args):
    html_str = "<center><font size=6 style='color:#DE3163'>The Return On Investement on PG13-rated Movies.</font></center> <br>  " 
 
    for df in args:
        html_str += df.to_html()
    display_html(
        html_str.replace('table','table style="display:inline"'), 
        raw=True
    )

Below will be the creation of dataframes that are in the 'Drama Genre' that are 'NC-17 rated' based on the 'ROI' of each movie.

Index of all the 'NC-17' rated movies.

In [411]:
nc17_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
    if x == 'NC-17':nc17_index.append(i)
print(nc17_index) #showing the nc17_index list
[112, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305]

Checking the number of elements in the 'nc17_index' list.

In [613]:
len(nc17_index)
Out[613]:
49

Getiing the Profit for all 'NC-17' rated movies.

In [116]:
nc17_profit = []
for i in nc17_index:
        nc17_profit.append(Drama_DataFrame.Profit[i])
print(nc17_profit) #showing the nc17_profit list
[13912841.0, 4856268.0, 8404.0, 257845.0, 659312.0, 18912216.0, -24649246.0, 89410061.0, -4503941.0, 121165.0, -1712236.0, 52091915.0, 13912841.0, 15465835.0, -7249246.0, 307113.0, 13912841.0, 15390895.0, 15566240.0, 1315026.0, 256669.0, 201120004.0, -5340890.0, 50167430.0, -17763156.0, 2311944.0, -794431.0, 13912841.0, -2605698.0, 2548651.0, -216465.0, -596907.0, 16283563.0, 3664240.0, 1038916.0, -2509128.0, 8000000.0, 18912216.0, 94673038.0, 34897711.0, 401802.0, 50167430.0, 3546453.0, -672713.0, -13085834.0, -3838180.0, 958404.0, -2237424.0, 858737.0]

Checking the number of elements in the 'nc17_profit' list.

In [615]:
len(nc17_profit)
Out[615]:
49

Getiing the Cost for all 'NC-17' rated movies.

In [117]:
nc17_cost = []
for i in nc17_index:
        nc17_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(nc17_cost) #showing the nc17_cost list
['$6,500,000', '$12,500,000', '$1,000,000', '$20,000', '$955,472', '$1,500,000', '$45,000,000', '$9,000,000', '$5,000,000', '$15,000,000', '$2,734,384', '$15,000,000', '$6,500,000', '$4,000,000', '$45,000,000', '$15,000,000', '$6,500,000', '$4,074,940', '$1,000,000', '$1,000,000', '$3,565,572', '$12,000,000', '$10,000,000', '$15,000,000', '$19,000,000', '$350,000', '$1,000,000', '$6,500,000', '$4,700,000', '$904,765', '$3,000,000', '$700,000', '$34,000,000', '$230,000', '$1,000,000', '$3,200,000', '$1,000,000', '$1,500,000', '$6,500,000', '$1,250,000', '$12,000', '$15,000,000', '$2,200,000', '$1,300,000', '$15,000,000', '$6,400,000', '$50,000', '$3,259,572', '$612,072']

Checking the number of elements in the 'nc17_cost' list.

In [617]:
len(nc17_cost)
Out[617]:
49

Getiing the Name for all 'NC-17' rated movies.

In [118]:
nc17_name = []
for i in nc17_index:
        nc17_name.append(Drama_DataFrame.Movie[i])
print(nc17_name) #showing the nc17_name list
['Shame', 'Matador', 'Whore', 'Tokyo Decadence', 'Wide Sargasso Sea', 'Kids', 'Showgirls', 'Crash', 'Bent', 'The Dreamers', 'Ma mère', 'Lust, Caution', 'Shame', 'Blue Is the Warmest Colour', 'Showgirls', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine', 'Two Girls and a Guy', 'Elles', 'Hell', 'Killer Joe', 'Se, jie', 'Queen of Hearts', 'The Evil Dead', 'Man Bites Dog', 'Shame', 'Nymphomaniac: Vol. I', 'Arabian Nights', 'Frontier(s)', 'Chained', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 'The Big Feast', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris', 'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Orgazmo', 'A Dirty Shame', 'Young Adam', 'Whore 1991', 'Ma Mère', 'Law of Desire']

Checking the number of elements in the 'nc17_name' list.

In [307]:
len(nc17_name)
Out[307]:
49

Getiing the ROI for all 'NC-17' rated movies.

In [119]:
nc17_return_on_investment = []
for i in nc17_index:
        nc17_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(nc17_return_on_investment) #showing the nc17_return_on_investment list
['$13,912,841', '$4,856,268', '$8,404', '$257,845', '$659,312', '$18,912,216', '$-24,649,246', '$89,410,061', '$-4,503,941', '$121,165', '$-1,712,236', '$52,091,915', '$13,912,841', '$15,465,835', '$-7,249,246', '$307,113', '$13,912,841', '$15,390,895', '$15,566,240', '$1,315,026', '$256,669', '$201,120,004', '$-5,340,890', '$50,167,430', '$-17,763,156', '$2,311,944', '$-794,431', '$13,912,841', '$-2,605,698', '$2,548,651', '$-216,465', '$-596,907', '$16,283,563', '$3,664,240', '$1,038,916', '$-2,509,128', '$8,000,000', '$18,912,216', '$94,673,038', '$34,897,711', '$401,802', '$50,167,430', '$3,546,453', '$-672,713', '$-13,085,834', '$-3,838,180', '$958,404', '$-2,237,424', '$858,737']

Checking the number of elements in the 'nc17_return_on_investment' list.

In [621]:
len(nc17_return_on_investment)
Out[621]:
49

Getiing the Ratings of all 'NC-17' rated movies.

In [120]:
nc17_rating = []
for i in nc17_index:
        nc17_rating.append(Drama_DataFrame.Averagerating[i])
print(nc17_rating) #showing the nc17_rating list
[7.2, 7.0, 5.6, 6.0, 5.7, 7.1, 4.9, 6.4, 7.2, 7.2, 5.1, 7.5, 7.2, 7.7, 4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4, 7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]

Checking the number of elements in the 'nc17_rating' list.

In [309]:
len(nc17_rating)
Out[309]:
49

Getiing the Profit Percentage of all 'NC-17' rated movies.

In [412]:
nc17_percent_profit = []
for i in nc17_index:
    i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
    nc17_percent_profit.append(int(round(i,0)))
print(nc17_percent_profit) #showing the nc17_percent_profit list
[214, 39, 1, 1289, 69, 1261, -55, 993, -90, 1, -63, 347, 214, 387, -16, 2, 214, 378, 1557, 132, 7, 1676, -53, 334, -93, 661, -79, 214, -55, 282, -7, -85, 48, 1593, 104, -78, 800, 1261, 1457, 2792, 3348, 334, 161, -52, -87, -60, 1917, -69, 140]

Checking the number of elements in the 'nc17_percent_profit' list.

In [624]:
len(nc17_percent_profit)
Out[624]:
49

Converting integer of the ROI values to percentage of all 'NC-17' rated movies.

In [122]:
nc17_roi_percent = []
for i in nc17_percent_profit:
    nc17_roi_percent.append("{:}%".format(i))
print(nc17_roi_percent) #showing the nc17_roi_percent list
['214%', '39%', '1%', '1289%', '69%', '1261%', '-55%', '993%', '-90%', '1%', '-63%', '347%', '214%', '387%', '-16%', '2%', '214%', '378%', '1557%', '132%', '7%', '1676%', '-53%', '334%', '-93%', '661%', '-79%', '214%', '-55%', '282%', '-7%', '-85%', '48%', '1593%', '104%', '-78%', '800%', '1261%', '1457%', '2792%', '3348%', '334%', '161%', '-52%', '-87%', '-60%', '1917%', '-69%', '140%']

Checking the number of elements in the 'nc17_roi_percent' list.

In [626]:
len(nc17_roi_percent)
Out[626]:
49

Turning the integer of the star rating of each movie into a star of all 'NC-17' rated movies.

In [123]:
nc17_stars = []
for i in nc17_rating:
    nc17_stars.append('*'*int(i))
print(nc17_stars) #showing the nc17_stars list
['*******', '*******', '*****', '******', '*****', '*******', '****', '******', '*******', '*******', '*****', '*******', '*******', '*******', '****', '*******', '*******', '*******', '*******', '*****', '*****', '*****', '******', '*******', '*******', '*******', '*******', '*******', '******', '******', '******', '******', '*******', '*******', '*******', '******', '******', '*******', '*******', '******', '******', '*******', '*******', '******', '*****', '******', '*****', '*****', '*******']

Checking the number of elements in the 'nc17_stars' list.

In [312]:
len(nc17_stars)
Out[312]:
49

Createing the 'NC-17' rated dataframe with the variables previously created.

In [130]:
system_rating_nc17 = pd.DataFrame({"Name of Movie":nc17_name, "Cost":nc17_cost, 
                                "Return On Investment":nc17_return_on_investment, 
                                "ROI Percentage":nc17_roi_percent,"Ratings":nc17_stars})

The 'system_rating_nc17' dataframe. (this dataframe is interactive)

In [413]:
system_rating_nc17
Out[413]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the index of all the negative values.

In [414]:
neg_values = []
for i,x in enumerate(nc17_percent_profit): 
    if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[6, 8, 10, 14, 22, 24, 26, 28, 30, 31, 35, 43, 44, 45, 47]

Checking the number of elements in the 'neg_values' list.

In [637]:
len(neg_values)
Out[637]:
15

Dropping the negative values and resetting the index of the system_rating_pg dataframe.

In [415]:
system_rating_nc17 = system_rating_nc17.drop(labels=neg_values, axis=0)
system_rating_nc17 = system_rating_nc17.reset_index(drop=True)

The new 'system_rating_pg13' dataframe. It will be divided into two dataframes. (this dataframe is interactive)

In [416]:
system_rating_nc17
Out[416]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_nc171 is the first dataframe. (this dataframe is interactive)

In [417]:
system_rating_nc171=system_rating_nc17[:17]
system_rating_nc171
Out[417]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

System_rating_nc172 is the second dataframe. (this dataframe is interactive)

In [418]:
system_rating_nc172=system_rating_nc17[17:]
system_rating_nc172
Out[418]:
Name of Movie Cost Return On Investment ROI Percentage Ratings
Loading... (need help?)

Getting the average Budget of all the 'NC-17' rated movies in the Drama genre.

In [641]:
nc_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_nc17['Cost']]) / len(system_rating_nc17['Cost'])

The average Budget of all the 'NC-17' rated Drama movies is $5,918,377.

In [644]:
nc_avg_value
Out[644]:
5918377.088235294

Getting the index of all the movies that are below the average Return On Investment of all the 'NC-17' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'NC-17' rated Drama mvoies.

In [645]:
nc_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_nc17['Cost']]
#below ayg
nc_below_avg1 = []
for i,x in enumerate(nc_cost_index):
    if x <= 5918377:nc_below_avg1.append(i)
    
nc_below_avg2 = []
for i,x in enumerate(nc_cost_index):
    if x >= 5918377:nc_below_avg2.append(i)

The 'nc_below_avg1' list.

In [646]:
print(nc_below_avg1)
[2, 3, 4, 5, 10, 13, 14, 15, 16, 19, 21, 23, 24, 25, 26, 28, 29, 31, 32, 33]

The 'nc_below_avg2' list.

In [647]:
print(nc_below_avg2)
[0, 1, 6, 7, 8, 9, 11, 12, 17, 18, 20, 22, 27, 30]

Getting the average Return On Investment Percentage of all the 'NC-17' rated movies in the Drama genre.

In [648]:
nc_avg_value = sum([int(i.replace('$', '').replace(',', ''))
                 for i in system_rating_nc17['Return On Investment']]) / len(system_rating_nc17['Cost'])

The average Return On Investment of all the 'NC-17' rated Drama movies is $2,2347,672.

In [649]:
nc_avg_value
Out[649]:
22347672.55882353

Getting the index of all the movies that are below the average Return On Investment of all the 'NC-17' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'NC-17' rated Drama mvoies.

In [650]:
nc_roi_index = [int(i.replace('$', '').replace(',', ''))
                for i in system_rating_nc17['Return On Investment']]
#below ayg
nc_below_avg3 = []
for i,x in enumerate(nc_roi_index):
    if x <= 22347672:nc_below_avg3.append(i)
    
nc_below_avg4 = []
for i,x in enumerate(nc_roi_index):
    if x >= 22347672:nc_below_avg4.append(i)

The 'nc_below_avg3' list.

In [651]:
print(nc_below_avg3)
[0, 1, 2, 3, 4, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 22, 23, 24, 25, 26, 29, 31, 32, 33]

The 'nc_below_avg4' list.

In [652]:
print(nc_below_avg4)
[6, 8, 17, 18, 27, 28, 30]

Getting the average Return On Investment Percentage of all the 'NC-17' rated movies in the Drama genre.

In [653]:
nc_avg_value = sum([int(i.replace('%', ''))
                 for i in system_rating_nc17['ROI Percentage']]) / len(system_rating_nc17['Cost'])

The average Return On Investment Percentage of all the 'NC-17' rated Drama movies is 712%.

In [654]:
nc_avg_value
Out[654]:
712.5588235294117

Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'NC-17' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'NC-17' rated Drama mvoies.

In [655]:
roi_percent_index_nc = [int(i.replace('%', ''))for i in system_rating_nc17['ROI Percentage']]
#below ayg
nc_above_avg = []
for i,x in enumerate(roi_percent_index_nc):
    if x >= 712:nc_above_avg.append(i)

The 'nc_above_avg' list.

In [656]:
print(nc_above_avg)
[3, 5, 6, 14, 17, 23, 25, 26, 27, 28, 29, 32]

Styling Syetem_rating_nc171 using the eight functions and the indexes to do so.

In [667]:
def Ratings1(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(17):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight2(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(17):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#581845;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
    return df 


def highlight_cells3(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in nc_below_avg1[:9]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells4(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in nc_below_avg2[:8]:
        df.iloc[i,1] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells5(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in nc_below_avg3[:15]:
        df.iloc[i,2] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells6(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in nc_below_avg4[:2]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells7(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in nc_above_avg[:4]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 

def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 


system_rating_nc171 = system_rating_nc171.style.apply(Ratings_highlight2, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #581845')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#581845")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#581845')]}
                                                                                 ])\
            .apply(Ratings1, axis=None)\
            .apply(highlight_cells3, axis=None)\
            .apply(highlight_cells4, axis=None)\
            .apply(highlight_cells5, axis=None)\
            .apply(highlight_cells6, axis=None)\
            .apply(highlight_cells7, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 1')

Saving the System_rating_nc171 dataframe to the System_rating_nc171.png file as an image to be used for the analysis later on.

In [668]:
dfi.export(system_rating_nc171, 'system_rating_nc171.png')

The 'Syetem_rating_nc171' datarame.

Styling Syetem_rating_nc172 using the eight functions and the indexes to do so.

In [157]:
def Ratings8(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(17):#range(19,37):
        df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
        df.iloc[i,-4] = "font-size : 8pt"
    return df 

def Ratings_highlight9(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in range(17):
        df.iloc[i,-1] = 'color:#FFD700;background-color:white'
        df.iloc[i,0] = 'color:#581845;background-color:white;font-size:8pt;font-weight: bold'
        df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
    return df 

def highlight_cells10(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [2, 3, 4, 6, 7, 8, 9, 11, 12, 14, 15, 16]:
        df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 

def highlight_cells90(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 1, 5, 10, 13]:
        df.iloc[i,1] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells11(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [2, 3, 4, 5, 6, 7, 8, 9, 12, 14, 15, 16]:
        df.iloc[i,2] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
    return df 

def highlight_cells12(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 1, 10, 11, 13]:
        df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
    return df 


def highlight_cells13(x):
    df = x.copy()
    df.loc[:,:] = '' 
    for i in [0, 6, 8, 9, 10, 11, 12, 15]:
        df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
    return df 

def borders(x):
    df = x.copy()
    df.loc[:,:] = '' 
    df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,0] = 'border-right: 6px solid blue'
    df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    df.iloc[15:18,0] = 'border-right: 6px solid blue'
    df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    
    df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
    df.iloc[22,0] = 'border-right: 6px solid blue'
    df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
    return df 

system_rating_nc172 = system_rating_nc172.style.apply(Ratings_highlight9, axis=None)\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid #581845')]},
            {"selector":"thead", 'props':[("background-color","white"),("color","#581845")]},#headinig
            #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','#581845')]}#index
                         ])\
            .apply(Ratings8, axis=None)\
            .apply(highlight_cells10, axis=None)\
            .apply(highlight_cells90, axis=None)\
            .apply(highlight_cells11, axis=None)\
            .apply(highlight_cells12, axis=None)\
            .apply(highlight_cells13, axis=None)\
            #.set_table_attributes("style='display:inline'")\
            #.set_caption('Caption table 2')
            #.apply(borders, axis=None)
            #display_html(df1_style._repr_html_() + df2_style._repr_html_(), raw=True)

Saving the System_rating_nc172 dataframe to the System_rating_nc172.png file as an image to be used for the analysis later on.

In [158]:
dfi.export(system_rating_nc172, 'system_rating_nc172.png')

The 'Syetem_rating_nc172' datarame.

This allows all the three dataframes to be displayed side by side.

In [329]:
def display_side_by_side5(*args):
    html_str = "<center><font size=6 style='color:#FF2400'>The Return On Investement on NC-17 Rated Movies.</font></center> <br>  " 
 
    for df in args:
        html_str += df.to_html()
    display_html(
        html_str.replace('table','table style="display:inline"'), 
        raw=True
    )

Getting the RIO of all the 'R' rated movies.

In [159]:
RIO_R = []
for i in r_percent_profit:
        i /= 10
        RIO_R.append(i)
    
RIO_R.sort(reverse=True)
print(RIO_R) #showing the RIO_R list
[267.0, 244.8, 185.2, 155.7, 133.2, 132.7, 108.1, 105.6, 97.0, 85.0, 81.5, 81.3, 80.8, 73.1, 70.7, 67.5, 60.1, 59.3, 57.5, 50.4, 50.1, 46.5, 44.4, 41.8, 41.1, 40.8, 35.0, 26.3, 25.8, 25.0, 23.8, 21.8, 21.6, 19.9, 19.5, 17.9, 15.6, 13.3, 13.2, 8.1, 7.4, 6.5, 5.4, 4.4, 4.1, 4.0, 3.8, 3.6, 3.5, 2.2, 2.1, 1.6, 1.3, 0.5, 0.4, 0.0, -0.5, -0.9, -1.5, -2.0, -2.6, -2.9, -4.4, -4.9, -5.3, -5.7, -5.8, -6.2, -7.3, -7.5, -7.8, -8.1, -8.2, -8.3, -9.5, -9.6, -9.8]

Checking the number of elements in the 'RIO_R' list.

In [160]:
len(RIO_R[:-22])
Out[160]:
55

Changing the RIO from interger to currency(dollars).

In [161]:
currency_R = []
for i in RIO_R[:-22]:
    currency_R.append("${:,.2f}".format(i))
print(currency_R) #showing the currency_R list
['$267.00', '$244.80', '$185.20', '$155.70', '$133.20', '$132.70', '$108.10', '$105.60', '$97.00', '$85.00', '$81.50', '$81.30', '$80.80', '$73.10', '$70.70', '$67.50', '$60.10', '$59.30', '$57.50', '$50.40', '$50.10', '$46.50', '$44.40', '$41.80', '$41.10', '$40.80', '$35.00', '$26.30', '$25.80', '$25.00', '$23.80', '$21.80', '$21.60', '$19.90', '$19.50', '$17.90', '$15.60', '$13.30', '$13.20', '$8.10', '$7.40', '$6.50', '$5.40', '$4.40', '$4.10', '$4.00', '$3.80', '$3.60', '$3.50', '$2.20', '$2.10', '$1.60', '$1.30', '$0.50', '$0.40']

Checking the number of elements in the 'currency_R' list.

In [674]:
len(currency_R)
Out[674]:
55

Getting the Mean of RIO of all the 'R' rated movies.

In [162]:
avg_R = statistics.mean(RIO_R[:-22])
avg_R
Out[162]:
50.88727272727272

Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'R' rated movies.

In [333]:
np.percentile(RIO_R[:-22], [25,50,75])
Out[333]:
array([ 6.95, 26.3 , 71.9 ])

Getting the Name of all the 'R' rated movies to create the dataframe_RIO_r dataframe.

In [163]:
final_name_r = []
reversed_name = []
for x,i in enumerate(r_name):
    reversed_name.append((r_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-22]:
    final_name_r.append(i[1])
print(final_name_r) #showing the final_name_r list
['A Ghost Story', 'Black Swan', 'Ghost Story', 'Blue Valentine', 'Boyhood', 'Fifty Shades of Grey', 'Whiplash', 'The Witch', 'Buried', 'Unsane', 'Manchester by the Sea', 'Ordinary People', 'Fame', 'Silent House', "Winter's Bone", 'Before Midnight', 'Hereditary', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Gone Girl', 'Margin Call', 'The Florida Project', 'Martha Marcy May Marlene', 'Flight', 'Quartet', 'We Are Your Friends', 'Django Unchained', 'Carol', 'Mommy', 'Addicted', 'The Ides of March', 'Sound of My Voice', 'Knock Knock', 'Arbitrage', 'Ex Machina', 'Room', 'Zero Dark Thirty', 'The Debt', 'Melancholia', 'For Colored Girls', 'Endless Love', 'If Beale Street Could Talk', 'We Need to Talk About Kevin', 'Nocturnal Animals', 'Let Me In', 'Priest', 'The Water Diviner', 'Crimson Peak', 'The Master', 'Raggedy Man', 'Zoot Suit', 'Palo Alto', 'Rich and Famous', 'Take Shelter', 'Locke']

Checking the number of elements in the 'final_name_r' list.

In [334]:
len(final_name_r)
Out[334]:
55

The dataframe_RIO_r dataframe is created.

In [164]:
dataframe_RIO_r = pd.DataFrame({"Name of Movie":final_name_r, 
                                "Money Generated for Every $1 Spent":currency_R})

The 'dataframe_RIO_r' dataframe. (this dataframe is interactive)

In [419]:
dataframe_RIO_r
Out[419]:
Name of Movie Money Generated for Every $1 Spent
Loading... (need help?)

Getting the RIO of all the 'G' rated movies.

In [166]:
RIO_G = []
for i in g_percent_profit:
    i /= 10
    RIO_G.append(i)
    
RIO_G.sort(reverse=True)
print(RIO_G) #showing the RIO_G list
[3113.5, 339.0, 209.3, 209.2, 162.9, 130.0, 72.0, 60.6, 57.8, 40.4, 37.7, 37.2, 36.5, 33.4, 32.4, 26.7, 26.6, 24.4, 19.1, 16.6, 8.3, 7.7, 6.9, 5.1, 5.0, -2.3, -4.1, -4.1, -5.2, -6.4, -8.0, -8.8, -9.9, -9.9]

Checking the number of elements in the 'RIO_G' list.

In [679]:
len(RIO_G[:-9])
Out[679]:
25

Changing the RIO from interger to currency(dollars).

In [167]:
currency_G = []
for i in RIO_G[:-9]:
    currency_G.append("${:,.2f}".format(i))
print(currency_G) #showing the currency_G list
['$3,113.50', '$339.00', '$209.30', '$209.20', '$162.90', '$130.00', '$72.00', '$60.60', '$57.80', '$40.40', '$37.70', '$37.20', '$36.50', '$33.40', '$32.40', '$26.70', '$26.60', '$24.40', '$19.10', '$16.60', '$8.30', '$7.70', '$6.90', '$5.10', '$5.00']

Checking the number of elements in the 'currency_G' list.

In [681]:
len(currency_G)
Out[681]:
25

Getting the Mean of RIO of all the 'G' rated movies.

In [682]:
avg_G = statistics.mean(RIO_G[:-9])
avg_G
Out[682]:
188.732

Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'G' rated movies.

In [683]:
np.percentile(RIO_G[:-9], [25,50,75])
Out[683]:
array([19.1, 36.5, 72. ])

Getting the Name of all the 'G' rated movies to create the dataframe_RIO_g dataframe.

In [168]:
final_name_g = []
reversed_name = []
for x,i in enumerate(g_name):
    reversed_name.append((g_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-9]:
    final_name_g.append(i[1])
print(final_name_g) #showing the final_name_g list
['Bambi 1942', 'The Sound of Music', 'Beauty and the Beast 1991', 'The Lion King 1994', 'The Secret Garden', 'The Black Stallion', 'Babe', 'Three Cions in the Fountain', 'Lassie Come Home', 'The Ten Commandments 1966', "Hachiko: A Dog's Story", 'Giant', 'The Hunchback of Notre Drame', 'The Quiet Man', 'My Fair Lady 1964', 'The Rookie', 'The Rookie', 'A Sunday in the Country', 'The Little Rascals', 'Prancer', 'Ramona and Beezus', 'Kit Kittredge: An American Girl', "Charlotte's Web", 'The Tale of Despereaux', 'Pollyanna']

Checking the number of elements in the 'final_name_g' list.

In [340]:
len(final_name_g)
Out[340]:
25

The dataframe_RIO_g dataframe is created.

In [169]:
dataframe_RIO_g = pd.DataFrame({"Name of Movie":final_name_g, 
                                "Money Generated for Every $1 Spent":currency_G})

The 'dataframe_RIO_g' dataframe. (this dataframe is interactive)

In [420]:
dataframe_RIO_g
Out[420]:
Name of Movie Money Generated for Every $1 Spent
Loading... (need help?)

Getting the RIO of all the 'PG' rated movies.

In [171]:
RIO_PG = []
for i in pg_percent_profit:
    i /= 10
    RIO_PG.append(i)
    
RIO_PG.sort(reverse=True)
print(RIO_PG) #showing the RIO_PG list
[1088.7, 659.5, 632.6, 313.3, 236.6, 236.6, 214.7, 143.0, 142.3, 87.6, 74.1, 70.9, 69.5, 66.2, 63.6, 62.0, 54.0, 47.1, 44.2, 37.5, 36.9, 36.3, 36.2, 34.6, 32.6, 30.2, 26.2, 25.6, 23.9, 19.5, 18.7, 16.0, 15.6, 13.7, 11.5, 10.3, 9.9, 9.0, 9.0, 8.8, 7.1, 5.2, 3.8, 2.9, 2.1, 0.0, -0.4, -0.5, -2.1, -2.2, -2.6, -3.0, -3.3, -4.5, -4.7, -4.8, -4.9, -5.5, -5.7, -5.9, -6.2, -7.0, -7.3, -8.1, -8.9, -9.0, -9.9]

Checking the number of elements in the 'RIO_PG' list.

In [688]:
len(RIO_PG[:-22])
Out[688]:
45

Changing the RIO from interger to currency(dollars).

In [172]:
currency_PG = []
for i in RIO_PG[:-22]:
    currency_PG.append("${:,.2f}".format(i))
print(currency_PG) #showing the currency_PG list
['$1,088.70', '$659.50', '$632.60', '$313.30', '$236.60', '$236.60', '$214.70', '$143.00', '$142.30', '$87.60', '$74.10', '$70.90', '$69.50', '$66.20', '$63.60', '$62.00', '$54.00', '$47.10', '$44.20', '$37.50', '$36.90', '$36.30', '$36.20', '$34.60', '$32.60', '$30.20', '$26.20', '$25.60', '$23.90', '$19.50', '$18.70', '$16.00', '$15.60', '$13.70', '$11.50', '$10.30', '$9.90', '$9.00', '$9.00', '$8.80', '$7.10', '$5.20', '$3.80', '$2.90', '$2.10']

Checking the number of elements in the 'currency_PG' list.

In [690]:
len(currency_PG)
Out[690]:
45

Getting the Mean of RIO of all the 'PG' rated movies.

In [691]:
avg_PG = statistics.mean(RIO_PG[:-22])
avg_PG
Out[691]:
106.43555555555555

Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'PG' rated movies.

In [692]:
np.percentile(RIO_PG[:-22], [25,50,75])
Out[692]:
array([13.7, 36.2, 70.9])

Getting the Name of all the 'PG' rated movies to create the dataframe_RIO_pg dataframe.

In [173]:
final_name_pg = []
reversed_name = []
for x,i in enumerate(pg_name):
    reversed_name.append((pg_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-22]:
    final_name_pg.append(i[1])
print(final_name_pg) #showing the final_name_pg list
['Tex', 'Fireproof', 'The Jazz Singer', "God's Not Dead", 'War Room', 'War Room', 'Resurrection', 'Wonder', 'Wonder', 'Footloose', 'Sense and Sensibility', 'Bridge to Terabithia', 'On Golden Pond', 'Overcomer', 'Rocky III', 'The Lunchbox', 'Forever Young', 'Cinderella', 'Little Women', 'Phenomenon', 'Urban Cowboy', 'The Last Song', "Mr. Holland's Opus", 'The Last Song', 'The Remains of the Day', 'A Walk to Remember', 'A River Runs Through It', 'Honeysuckle Rose', 'Absence of Malice', 'Staying Alive', 'The Lake House', 'Dolphin Tale', 'Taps', 'Akeelah and the Bee', 'August Rush', 'The Secret of Roan Inish', 'The Night the Lights Went Out in Georgia', 'Somewhere in Time', 'Contact', 'Tender Mercies', 'The Natural', 'Pure Country', 'The Spanish Prisoner', 'Tuck Everlasting', 'Dreamer']

Checking the number of elements in the 'final_name_pg' list.

In [694]:
len(final_name_pg)
Out[694]:
45

The dataframe_RIO_pg dataframe is created.

In [174]:
dataframe_RIO_pg = pd.DataFrame({"Name of Movie":final_name_pg, 
                                 "Money Generated for Every $1 Spent":currency_PG})

The 'dataframe_RIO_pg' dataframe. (this dataframe is interactive)

In [421]:
dataframe_RIO_pg
Out[421]:
Name of Movie Money Generated for Every $1 Spent
Loading... (need help?)

Getting the RIO of all the 'PG-13' rated movies.

In [176]:
RIO_PG13 = []
for i in pg13_percent_profit:
    i /= 10
    RIO_PG13.append(i)
    
RIO_PG13.sort(reverse=True)
print(RIO_PG13) #showing the RIO_PG13 list
[287.6, 186.8, 165.9, 139.1, 110.2, 94.1, 80.9, 76.0, 75.2, 74.6, 73.4, 62.1, 61.2, 55.9, 53.1, 51.6, 48.8, 46.8, 46.4, 42.8, 40.2, 36.9, 33.2, 32.7, 32.4, 32.1, 30.6, 30.0, 29.7, 29.4, 28.7, 27.9, 27.9, 25.9, 25.3, 23.6, 23.2, 22.8, 21.6, 20.7, 17.4, 17.3, 16.8, 16.6, 16.3, 15.4, 14.4, 14.3, 13.7, 13.0, 12.9, 12.4, 11.7, 9.6, 8.8, 7.6, 6.5, 5.8, 4.5, 3.7, 2.4, 1.9, 1.1, 1.0, -1.7, -2.0, -2.1, -2.2, -4.2, -4.5, -4.7, -5.7, -5.9, -7.2, -7.6, -7.7]

Checking the number of elements in the 'RIO_PG13' list.

In [699]:
len(RIO_PG13[:-12])
Out[699]:
64

Changing the RIO from interger to currency(dollars).

In [177]:
currency_PG13 = []
for i in RIO_PG13[:-12]:
    currency_PG13.append("${:,.2f}".format(i))
print(currency_PG13) #showing the currency_PG13 list
['$287.60', '$186.80', '$165.90', '$139.10', '$110.20', '$94.10', '$80.90', '$76.00', '$75.20', '$74.60', '$73.40', '$62.10', '$61.20', '$55.90', '$53.10', '$51.60', '$48.80', '$46.80', '$46.40', '$42.80', '$40.20', '$36.90', '$33.20', '$32.70', '$32.40', '$32.10', '$30.60', '$30.00', '$29.70', '$29.40', '$28.70', '$27.90', '$27.90', '$25.90', '$25.30', '$23.60', '$23.20', '$22.80', '$21.60', '$20.70', '$17.40', '$17.30', '$16.80', '$16.60', '$16.30', '$15.40', '$14.40', '$14.30', '$13.70', '$13.00', '$12.90', '$12.40', '$11.70', '$9.60', '$8.80', '$7.60', '$6.50', '$5.80', '$4.50', '$3.70', '$2.40', '$1.90', '$1.10', '$1.00']

Checking the number of elements in the 'currency_PG13' list.

In [349]:
len(currency_PG13)
Out[349]:
64

Getting the Name of all the 'PG-13' rated movies to create the dataframe_RIO_pg13 dataframe.

In [178]:
final_name_pg13 = []
reversed_name = []
for x,i in enumerate(pg13_name):
    reversed_name.append((pg13_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-12]:
    final_name_pg13.append(i[1])
print(final_name_pg13) #showing the final_name_pg13 list
['Lights Out', 'A Quiet Place', 'Courageous', 'Like Crazy', 'Another Earth', 'Me Before You', 'Ouija: Origin of Evil', 'The Woman in Black', 'The Help', 'Sing', 'Still Alice', 'True Grit', 'If I Stay', 'The Vow', 'Gravity', 'Everything, Everything', 'Ida', 'Dear John', 'Brooklyn', 'Gifted', 'Step Up Revolution', 'Creed', 'Arrival', 'Creed II', 'The Impossible', 'The Bye Bye Man', 'Bridge of Spies', 'The Book Thief', 'Mustang', 'One Day', 'The Lucky One', 'Before I Fall', 'Amour', 'The Post', 'Remember Me', 'Safe Haven', 'Rings', 'The Roommate', 'Mud', 'Water for Elephants', 'Project Almanac', 'The Words', 'Fences', 'The Giver', 'The Rite', 'The Perks of Being a Wallflower', 'Black or White', 'Suffragette', 'Collateral Beauty', 'The Age of Adaline', 'Contagion', 'Beastly', 'Hereafter', 'Wish Upon', 'The Longest Ride', 'The Tree of Life', 'Burlesque', 'The Best of Me', 'Anna Karenina', 'Country Strong', 'Rabbit Hole', 'Draft Day', 'The Light Between Oceans', 'Charlie St. Cloud']

Checking the number of elements in the 'final_name_pg13' list.

In [702]:
len(final_name_pg13)
Out[702]:
64

Getting the Mean of RIO of all the 'PG-13' rated movies.

In [703]:
avg_PG13 = statistics.mean(RIO_PG13[:-12])
avg_PG13
Out[703]:
41.44375

Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'PG-13' rated movies.

In [352]:
np.percentile(RIO_PG13[:-12], [25,50,75])
Out[352]:
array([14.15, 27.9 , 49.5 ])

The dataframe_RIO_pg13 dataframe is created.

In [179]:
dataframe_RIO_pg13 = pd.DataFrame({"Name of Movie":final_name_pg13, 
                                   "Money Generated for Every $1 Spent":currency_PG13})

The 'dataframe_RIO_pg13' dataframe. (this dataframe is interactive)

In [422]:
dataframe_RIO_pg13
Out[422]:
Name of Movie Money Generated for Every $1 Spent
Loading... (need help?)

Getting the RIO of all the 'NC-17' rated movies.

In [181]:
RIO_NC = []
for i in nc17_percent_profit:
    i /= 10
    RIO_NC.append(i)
    
RIO_NC.sort(reverse=True)
print(RIO_NC) #showing the RIO_NC list
[334.8, 279.2, 191.7, 167.6, 159.3, 155.7, 145.7, 128.9, 126.1, 126.1, 99.3, 80.0, 66.1, 38.7, 37.8, 34.7, 33.4, 33.4, 28.2, 21.4, 21.4, 21.4, 21.4, 16.1, 14.0, 13.2, 10.4, 6.9, 4.8, 3.9, 0.7, 0.2, 0.1, 0.1, -0.7, -1.6, -5.2, -5.3, -5.5, -5.5, -6.0, -6.3, -6.9, -7.8, -7.9, -8.5, -8.7, -9.0, -9.3]

Checking the number of elements in the 'RIO_NC' list.

In [705]:
len(RIO_NC[:-15])
Out[705]:
34

Changing the RIO from interger to currency(dollars).

In [182]:
currency_NC = []
for i in RIO_NC[:-15]:
    currency_NC.append("${:,.2f}".format(i))
print(currency_NC) #showing the currency_NC list
['$334.80', '$279.20', '$191.70', '$167.60', '$159.30', '$155.70', '$145.70', '$128.90', '$126.10', '$126.10', '$99.30', '$80.00', '$66.10', '$38.70', '$37.80', '$34.70', '$33.40', '$33.40', '$28.20', '$21.40', '$21.40', '$21.40', '$21.40', '$16.10', '$14.00', '$13.20', '$10.40', '$6.90', '$4.80', '$3.90', '$0.70', '$0.20', '$0.10', '$0.10']

Checking the number of elements in the 'currency_NC' list.

In [709]:
len(currency_NC)
Out[709]:
34

Getting the Name of all the 'NC-17' rated movies to create the dataframe_RIO_NC dataframe.

In [183]:
final_name_NC = []
reversed_name = []
for x,i in enumerate(nc17_name):
    reversed_name.append((nc17_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-15]:
    final_name_NC.append(i[1])
print(final_name_NC) #showing the final_name_NC list
['Pink Flamingos', 'Last Tango in Paris', 'Whore 1991', 'Hell', 'Clerks', 'Blue Valentine', 'Crash', 'Tokyo Decadence', 'Kids', 'Kids', 'Crash', 'Beyond the Valley of the Dolls', 'The Evil Dead', 'Blue Is the Warmest Colour', 'Blue Is the Warmest Colour', 'Lust, Caution', 'Se, jie', 'Lust, Caution ', 'Arabian Nights', 'Shame', 'Shame', 'Shame', 'Shame', 'Happiness 1998', 'Law of Desire', 'Two Girls and a Guy', 'Bad Lieutenant', 'Wide Sargasso Sea', 'Natural Born Killers', 'Matador', 'Elles', 'The Dreamers', 'Whore', 'The Dreamers']

Checking the number of elements in the 'final_name_NC' list.

In [711]:
len(final_name_NC)
Out[711]:
34

Getting the Mean of RIO of all the 'NC-17' rated movies.

In [712]:
avg_NC = statistics.mean(RIO_NC[:-15])
avg_NC
Out[712]:
71.25588235294117

Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'NC-17' rated movies.

In [713]:
np.percentile(RIO_NC[:-15], [25,50,75])
Out[713]:
array([ 13.4,  33.4, 126.1])

The dataframe_RIO_NC dataframe is created.

In [184]:
dataframe_RIO_NC = pd.DataFrame({"Name of Movie":final_name_NC, 
                                 "Money Generated for Every $1 Spent":currency_NC})

The 'dataframe_RIO_NC' dataframe. (this dataframe is interactive)

In [423]:
dataframe_RIO_NC
Out[423]:
Name of Movie Money Generated for Every $1 Spent
Loading... (need help?)

Styling the first portion of the 'dataframe_RIO_r' dataframe 'dataframe_RIO_r1' dataframe.

In [728]:
dataframe_RIO_r1  = dataframe_RIO_r[:19].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #ff5500')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#ff5500")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #ff5500'),
                                                                           ("color","#ff5500")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#ff5500')]}])

Saving the dataframe_RIO_r1 dataframe to the dataframe_RIO_r1.png file as an image to be used for the analysis later on.

In [729]:
dfi.export(dataframe_RIO_r1, 'dataframe_RIO_r1.png')

The 'dataframe_RIO_r1' datarame.

Styling the second portion of the 'dataframe_RIO_r' dataframe 'dataframe_RIO_r2' dataframe.

In [730]:
dataframe_RIO_r2  = dataframe_RIO_r[19:37].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #ff5500')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#ff5500")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #ff5500'),
                                                                           ("color","#ff5500")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#ff5500')]}])

Saving the dataframe_RIO_r2 dataframe to the dataframe_RIO_r2.png file as an image to be used for the analysis later on.

In [731]:
dfi.export(dataframe_RIO_r2, 'dataframe_RIO_r2.png')

The 'dataframe_RIO_r2' datarame.

Styling the last portion of the 'dataframe_RIO_r' dataframe 'dataframe_RIO_r3' dataframe.

In [732]:
dataframe_RIO_r3  = dataframe_RIO_r[37:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #ff5500')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#ff5500")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #ff5500'),
                                                                           ("color","#ff5500")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#ff5500')]}])

Saving the dataframe_RIO_r3 dataframe to the dataframe_RIO_r3.png file as an image to be used for the analysis later on.

In [733]:
dfi.export(dataframe_RIO_r3, 'dataframe_RIO_r3.png')

The 'dataframe_RIO_r3' datarame.

Styling the first portion of the 'dataframe_RIO_g' dataframe 'dataframe_RIO_g1' dataframe.

In [734]:
dataframe_RIO_g1 = dataframe_RIO_g[:12].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid red')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","red")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid red'),
                                                                           ("color","red")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','red')]}])

Saving the dataframe_RIO_g1 dataframe to the dataframe_RIO_g1.png file as an image to be used for the analysis later on.

In [736]:
dfi.export(dataframe_RIO_g1, 'dataframe_RIO_g1.png')

The 'dataframe_RIO_g1' datarame.

Styling the second portion of the 'dataframe_RIO_g' dataframe 'dataframe_RIO_g2' dataframe.

In [735]:
dataframe_RIO_g2 = dataframe_RIO_g[12:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid red')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","red")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid red'),
                                                                           ("color","red")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','red')]}])

Saving the dataframe_RIO_g2 dataframe to the dataframe_RIO_g2.png file as an image to be used for the analysis later on.

In [737]:
dfi.export(dataframe_RIO_g2, 'dataframe_RIO_g2.png')

The 'dataframe_RIO_g2' datarame.

Styling the first portion of the 'dataframe_RIO_pg' dataframe 'dataframe_RIO_pg1' dataframe.

In [739]:
dataframe_RIO_pg1 = dataframe_RIO_pg[:22].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #fa5f55')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#fa5f55")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #fa5f55'),
                                                                           ("color","#fa5f55")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#fa5f55')]}])

Saving the dataframe_RIO_pg1 dataframe to the dataframe_RIO_pg1.png file as an image to be used for the analysis later on.

In [740]:
dfi.export(dataframe_RIO_pg1, 'dataframe_RIO_pg1.png')

The 'dataframe_RIO_pg1' datarame.

Styling the second portion of the 'dataframe_RIO_pg' dataframe 'dataframe_RIO_pg2' dataframe.

In [741]:
dataframe_RIO_pg2 = dataframe_RIO_pg[22:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #fa5f55')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#fa5f55")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #fa5f55'),
                                                                           ("color","#fa5f55")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#fa5f55')]}])

Saving the dataframe_RIO_pg2 dataframe to the dataframe_RIO_pg2.png file as an image to be used for the analysis later on.

In [742]:
dfi.export(dataframe_RIO_pg2, 'dataframe_RIO_pg2.png')

The 'dataframe_RIO_pg2' datarame.

Styling the first portion of the 'dataframe_RIO_pg13' dataframe 'dataframe_RIO_pg131' dataframe.

In [743]:
dataframe_RIO_pg131 = dataframe_RIO_pg13[:22].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #DE3163')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#DE3163")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #DE3163'),
                                                                           ("color","#DE3163")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#DE3163')]}])

Saving the dataframe_RIO_pg131 dataframe to the dataframe_RIO_pg131.png file as an image to be used for the analysis later on.

In [746]:
dfi.export(dataframe_RIO_pg131, 'dataframe_RIO_pg131.png')

The 'dataframe_RIO_pg131' datarame.

Styling the second portion of the 'dataframe_RIO_pg13' dataframe 'dataframe_RIO_pg132' dataframe.

In [744]:
dataframe_RIO_pg132 = dataframe_RIO_pg13[22:42].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #DE3163')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#DE3163")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #DE3163'),
                                                                           ("color","#DE3163")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#DE3163')]}])

Saving the dataframe_RIO_pg132 dataframe to the dataframe_RIO_pg132.png file as an image to be used for the analysis later on.

In [747]:
dfi.export(dataframe_RIO_pg132, 'dataframe_RIO_pg132.png')

The 'dataframe_RIO_pg132' datarame.

Styling the last portion of the 'dataframe_RIO_pg13' dataframe 'dataframe_RIO_pg133' dataframe.

In [745]:
dataframe_RIO_pg133 = dataframe_RIO_pg13[42:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #DE3163')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#DE3163")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #DE3163'),
                                                                           ("color","#DE3163")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#DE3163')]}])

Saving the dataframe_RIO_pg133 dataframe to the dataframe_RIO_pg133.png file as an image to be used for the analysis later on.

In [748]:
dfi.export(dataframe_RIO_pg133, 'dataframe_RIO_pg133.png')

The 'dataframe_RIO_pg133' datarame.

Styling the first portion of the 'dataframe_RIO_NC' dataframe 'dataframe_RIO_NC1' dataframe.

In [754]:
dataframe_RIO_NC1 = dataframe_RIO_NC[:17].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #581845')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#581845")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #581845'),
                                                                           ("color","#581845")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#581845')]}])

Saving the dataframe_RIO_NC1 dataframe to the dataframe_RIO_NC1.png file as an image to be used for the analysis later on.

In [757]:
dfi.export(dataframe_RIO_NC1, 'dataframe_RIO_NC1.png')

The 'dataframe_RIO_NC1' datarame.

Styling the second portion of the 'dataframe_RIO_NC' dataframe 'dataframe_RIO_NC2' dataframe.

In [755]:
dataframe_RIO_NC2 = dataframe_RIO_NC[17:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #581845')]},
                                     {"selector":"thead", 'props':[("background-color","white"), 
                                    ("color","#581845")]},
                            {'selector':"td", "props":[("background-color","white"),('border-bottom',
                                                                                    '4px solid #581845'),
                                                                           ("color","#581845")]},
                        {'selector':'th.row_heading', 'props':[('background-color','white'),
                                                              ('color','#581845')]}])

Saving the dataframe_RIO_NC2 dataframe to the dataframe_RIO_NC2.png file as an image to be used for the analysis later on.

In [756]:
dfi.export(dataframe_RIO_NC2, 'dataframe_RIO_NC2.png')

The 'dataframe_RIO_NC2' datarame.

Getting the budget spent on all of the R-rated movies.

In [198]:
cost = []
for i in system_rating_r.Cost:
    i = int(i.replace('$', '').replace(',', ''))
    cost.append(i)
print(cost) #showing the cost list
[100000000, 61000000, 60000000, 55000000, 55000000, 55000000, 52500000, 40000000, 37500000, 31000000, 23000000, 22500000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 13000000, 12000000, 12000000, 11800000, 11000000, 10000000, 9400000, 8500000, 7000000, 5000000, 4900000, 4750000, 4000000, 3500000, 3400000, 3300000, 3000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 1987650, 1500000, 1000000, 1000000, 1000000, 135000, 100000, 6000000, 8500000, 20000000, 100000, 2700000, 11500000, 9000000]

Checking the number of elements in the 'cost' list.

In [199]:
len(cost)
Out[199]:
55

Putting the cost of all the R-rated movies into a dtaframe called df_cost_r.

In [200]:
df_cost_r = pd.DataFrame({"Cost":cost})

The 'df_cost_r' dataframe. (this dataframe is interactive)

In [424]:
df_cost_r
Out[424]:
Cost
Loading... (need help?)

Getting the Arithmetic Mean of the all the expenese spent on all of the R-rated movies.

In [372]:
x = statistics.mean(cost)
print("Arithmetic Mean of the cost for the R-rated movies is:", x)
Arithmetic Mean of the cost for the R-rated movies is: 16455866.363636363

Getting the Median of the all the expenese spent on all of the R-rated movies.

In [373]:
print("Median of the cost for the R-rated movies is:", statistics.median(cost))
Median of the cost for the R-rated movies is: 9000000

Getting the Mode of the all the expenese spent on all of the R-rated movies.

In [374]:
print("Mode of the cost for the R-rated movies is:",statistics.mode(cost))
Mode of the cost for the R-rated movies is: 2000000

Getting the Standard Deviation of the all the expenese spent on all of the R-rated movies.

In [375]:
print("Standard deviation of the cost for the R-rated movies is:", np.std(cost, ddof=1))
Standard deviation of the cost for the R-rated movies is: 20757148.21084636

Getting the Variance of the all the expenese spent on all of the R-rated movies.

In [376]:
print("Variance of the cost for the R-rated movies is:",statistics.variance(cost))
Variance of the cost for the R-rated movies is: 430859201847042.1

Getting the Coefficient Variation of the all the expenese spent on all of the R-rated movies.

In [377]:
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
print("Coefficient of Variation of the cost for the R-rated movies is:", cv(cost))
Coefficient of Variation of the cost for the R-rated movies is: 126.13828863313347

Getting the First Quartile of the all the expenese spent on all of the R-rated movies.

In [378]:
# First quartile (Q1)
Q1 = np.percentile(cost, 25, interpolation = 'midpoint')
print("The Q1 of the cost for the R-rated movies is:",Q1)
The Q1 of the cost for the R-rated movies is: 2350000.0

Getting the Third Quartile of the all the expenese spent on all of the R-rated movies.

In [379]:
# Third quartile (Q3)
Q3 = np.percentile(cost, 75, interpolation = 'midpoint')
print("The Q3 of the cost for the R-rated movies is:",Q3) 
The Q3 of the cost for the R-rated movies is: 20500000.0

Getting the Interquaritle Range of the all the expenese spent on all of the R-rated movies.

In [380]:
# Interquaritle range (IQR)
IQR = Q3 - Q1
print("The interquaritle range of the cost for the R-rated movies is:",IQR)
The interquaritle range of the cost for the R-rated movies is: 18150000.0

Getting the Pearson’s Coefficient of Skewness of the all the expenese spent on all of the R-rated movies.

In [381]:
def pearsons(mean, median, standard_deviation):
    skewness = (mean-median)*3/standard_deviation
    return skewness
print("Pearson’s Coefficient of Skewness of the cost for the R-rated movies is:", 
      pearsons( statistics.mean(cost),statistics.median(cost),np.std(cost, ddof=1)))   
Pearson’s Coefficient of Skewness of the cost for the R-rated movies is: 1.0775853630616372

Getting the Chebyshevs Theroem of the all the expenese spent on all of the R-rated movies.

In [382]:
def chebyshevs(mean, standard_deviation, num_std, previous_p):
    position_std = num_std*standard_deviation
    upper_range = mean - position_std 
    if upper_range < 0: upper_range = 0
    lower_range = position_std + mean
    if num_std == 2: 
        print('At least 75% of the butget of the r-rated movies ranges from',upper_range,'to',lower_range)
    if num_std == 3: 
        print('At least 13.9% of the butget of the r-rated movies ranges from',previous_p,'to',lower_range)
chebyshevs(16455866, 20757148, 2, 0)
chebyshevs(16455866, 20757148, 3, 57970162)
At least 75% of the butget of the r-rated movies ranges from 0 to 57970162
At least 13.9% of the butget of the r-rated movies ranges from 57970162 to 78727310

Getting the Kurtosis of the all the expenese spent on all of the R-rated movies.

In [383]:
print('Kurtosis of the budget of the r-rated movies is:',kurtosis(cost, fisher=False))
print('Excess Kurtosis of the budget of the r-rated movies is:',
      (kurtosis(cost,fisher=False)-3))#leptokurtic
Kurtosis of the budget of the r-rated movies is: 6.718303925498721
Excess Kurtosis of the budget of the r-rated movies is: 3.718303925498721

Getting the Arithmetic Mean and the Trimmed Mean of the all the expenese spent on all of the R-rated movies.

In [384]:
print("Arithmetic Mean of the cost for the R-rated movies is:", statistics.mean(cost))
print('10% Trimmed mean of the budget of the r-rated movies is:',stats.trim_mean(cost, 0.10))
Arithmetic Mean of the cost for the R-rated movies is: 16455866.363636363
10% Trimmed mean of the budget of the r-rated movies is: 12705281.111111112

Getting the Z-score of the all the expenese spent on all of the R-rated movies.

In [385]:
stats.zscore(cost)
Out[385]:
array([ 4.06193284,  2.16574487,  2.11712467,  1.87402365,  1.87402365,
        1.87402365,  1.75247314,  1.14472059,  1.02317007,  0.70713875,
        0.31817711,  0.29386701,  0.29386701,  0.22093671,  0.1723165 ,
        0.1723165 , -0.16802493, -0.16802493, -0.16802493, -0.21664513,
       -0.21664513, -0.22636917, -0.26526534, -0.31388554, -0.34305766,
       -0.38681585, -0.45974615, -0.55698656, -0.56184858, -0.56914161,
       -0.60560677, -0.62991687, -0.63477889, -0.63964091, -0.65422697,
       -0.70284717, -0.70284717, -0.70284717, -0.70284717, -0.70284717,
       -0.70284717, -0.70344763, -0.72715728, -0.75146738, -0.75146738,
       -0.75146738, -0.79352386, -0.79522556, -0.50836636, -0.38681585,
        0.1723165 , -0.79522556, -0.66881303, -0.24095523, -0.36250575])

Seperating all the expenese spent on all of the R-rated movies, into four categories, 'micro_bud' which is the lowest end of the expenses, 'low_bud' which is part of the lower end of the expenses, 'mid_bud' which is the middle of the expenses and 'high_bud' which is the higher end of the expenses.

In [204]:
micro_bud = 0
low_bud = 0
mid_bud = 0
high_bud = 0
for i in cost:
    if 0 <= i <= 100000:micro_bud+=1
for i in cost:
    if 100001 <= i <= 15000000:low_bud+=1
for i in cost:
    if 15000001 <= i <= 50000000:mid_bud+=1
for i in cost:
    if 50000001 <= i:high_bud+=1

Showing how many movies are in each category. 'micro_bud' has 2 movies, 'low_bud' has 36 movies, 'mid_bud' has 10 movies and 'high_bud' has 7 movies.

In [205]:
micro_bud,low_bud,mid_bud,high_bud
Out[205]:
(2, 36, 10, 7)

Created a function called 'Bernoulli_Dist' to get the Bernoulli Distribution of each category compared to the expenses spent on the R-rated movies.

In [206]:
def Bernoulli_Dist(micro,low,mid,high,n):
    micro_p = micro / n
    low_p = low / n
    mid_p = mid / n 
    high_p = high / n
    return micro_p, low_p, mid_p, high_p

Using the 'Bernoulli_Dist' function to get how distributed each category is.

In [207]:
p_vals = Bernoulli_Dist(micro_bud,low_bud,mid_bud,high_bud,55)
p_vals
Out[207]:
(0.03636363636363636,
 0.6545454545454545,
 0.18181818181818182,
 0.12727272727272726)

Seperating the 'micro_bud' and 'low_bud' with a total of 35 movies, into three catgories. The first category is between 100,000 and 5,000,000. The second category is between 5,000,001 and 10,000,000. The third category is between 10,000,001 and 15,000,000.

In [208]:
group1 = 0
group2 =0 
group3 = 0
for i in cost:
    if 1000000 <= i <= 5000000:group1+=1
for i in cost:
    if 5000001 <= i <= 10000000:group2+=1
for i in cost:
    if 10000001 <= i <=15000000:group3+=1

The first category has 20 movies. The second category has 27 movies. The third category has 28 movies.

In [209]:
group1,group2,group3
Out[209]:
(20, 7, 8)

Using the 'Bernoulli_Dist' function to get how distributed each category is.

In [210]:
p_vals1 = Bernoulli_Dist(group1,group2,group3,0,35)
p_vals1
Out[210]:
(0.5714285714285714, 0.2, 0.22857142857142856, 0.0)

Seperating the 'micro_bud' and 'low_bud' with a total of 35 movies, into three catgories. The first category is between 100,000 and 5,000,000. The second category is between 5,000,001 and 10,000,000. The third category is between 10,000,001 and 15,000,000.

In [211]:
group1 = 0
group2 =0 
group3 = 0
for i in cost:
    if 15000001 <= i <= 20000000:group1+=1
for i in cost:
    if 20000001 <= i <= 30000000:group2+=1
for i in cost:
    if 30000001 <= i <=50000000:group3+=1

The first category has 3 movies. The second category has 4 movies. The third category has 3 movies.

In [212]:
group1,group2,group3
Out[212]:
(3, 4, 3)

Using the 'Bernoulli_Dist' function to get how distributed each category is.

In [213]:
p_vals2 = Bernoulli_Dist(group1,group2,group3,0,10)
p_vals2
Out[213]:
(0.3, 0.4, 0.3, 0.0)

Seperating the 'micro_bud' and 'low_bud' with a total of 35 movies, into three catgories. The first category is between 100,000 and 5,000,000. The second category is between 5,000,001 and 10,000,000. The third category is between 10,000,001 and 15,000,000.

In [214]:
group1 = 0
group2 =0 
group3 = 0
for i in cost:
    if 50000001 <= i <= 60000000:group1+=1
for i in cost:
    if 60000001 <= i <= 70000000:group2+=1
for i in cost:
    if 90000000 <= i <=100000000:group3+=1

The first category has 5 movies. The second category has 1 movie. The third category has 1 movie.

In [215]:
group1,group2,group3
Out[215]:
(5, 1, 1)

Using the 'Bernoulli_Dist' function to get how distributed each category is.

In [216]:
p_vals3 = Bernoulli_Dist(group1,group2,group3,0,7)
p_vals3
Out[216]:
(0.7142857142857143, 0.14285714285714285, 0.14285714285714285, 0.0)

Rounding the expenses of R-rated movies to the nearest million and storing it in a list called 'freq_demo'.

In [217]:
freq_demo = []
for i in cost:
    freq_demo.append((round(i, -6)))
print(freq_demo) #showing the freq_demo list
[100000000, 61000000, 60000000, 55000000, 55000000, 55000000, 52000000, 40000000, 38000000, 31000000, 23000000, 22000000, 22000000, 21000000, 20000000, 20000000, 13000000, 13000000, 13000000, 12000000, 12000000, 12000000, 11000000, 10000000, 9000000, 8000000, 7000000, 5000000, 5000000, 5000000, 4000000, 4000000, 3000000, 3000000, 3000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 1000000, 1000000, 1000000, 0, 0, 6000000, 8000000, 20000000, 0, 3000000, 12000000, 9000000]

Checking the number of elements in the 'freq_demo' list.

In [218]:
len(freq_demo)
Out[218]:
55

Getting the index of each element in the 'freq_demo' list.

In [219]:
index_freq = []
for i,x in enumerate(freq_demo):index_freq.append((i,x))
print(index_freq) #showing the index_freq list
[(0, 100000000), (1, 61000000), (2, 60000000), (3, 55000000), (4, 55000000), (5, 55000000), (6, 52000000), (7, 40000000), (8, 38000000), (9, 31000000), (10, 23000000), (11, 22000000), (12, 22000000), (13, 21000000), (14, 20000000), (15, 20000000), (16, 13000000), (17, 13000000), (18, 13000000), (19, 12000000), (20, 12000000), (21, 12000000), (22, 11000000), (23, 10000000), (24, 9000000), (25, 8000000), (26, 7000000), (27, 5000000), (28, 5000000), (29, 5000000), (30, 4000000), (31, 4000000), (32, 3000000), (33, 3000000), (34, 3000000), (35, 2000000), (36, 2000000), (37, 2000000), (38, 2000000), (39, 2000000), (40, 2000000), (41, 2000000), (42, 2000000), (43, 1000000), (44, 1000000), (45, 1000000), (46, 0), (47, 0), (48, 6000000), (49, 8000000), (50, 20000000), (51, 0), (52, 3000000), (53, 12000000), (54, 9000000)]

Checking the number of elements in the 'index_freq' list.

In [768]:
len(index_freq)
Out[768]:
55

Replacing some elements in the 'freq_demo' list with another value.

In [220]:
freq_demo[48] = 10000000
freq_demo[6] = 55000000
freq_demo[22] = 10000000
freq_demo[49] = 10000000
freq_demo[8] = 40000000
freq_demo[10] = 20000000
freq_demo[11] = 20000000
freq_demo[12] = 20000000
freq_demo[13] = 20000000
freq_demo[24] = 10000000
freq_demo[25] = 10000000
freq_demo[26] = 10000000
freq_demo[54] = 10000000
freq_demo[1] = 60000000
freq_demo[2] = 60000000
freq_demo[-4] = 100000
freq_demo[-8] = 100000
freq_demo[-9] = 100000

Getting the Frequency of the Repeated Values of all the expenese spent of the R-rated Drama movies. Which will be stored in a dictionary called 'freq_demo1'.

In [221]:
freq_demo1 = Counter((freq_demo))
print(freq_demo1)#showing the freq_demo1 dicttionary
Counter({10000000: 8, 2000000: 8, 20000000: 7, 55000000: 4, 12000000: 4, 3000000: 4, 13000000: 3, 5000000: 3, 1000000: 3, 100000: 3, 60000000: 2, 40000000: 2, 4000000: 2, 100000000: 1, 31000000: 1})

Sorting the 'freq_demo1' dictionary in accending order.

In [222]:
freq_one = sorted(freq_demo1.items(), key=lambda i: i[0])
print(freq_one)#showing the freq_one list
[(100000, 3), (1000000, 3), (2000000, 8), (3000000, 4), (4000000, 2), (5000000, 3), (10000000, 8), (12000000, 4), (13000000, 3), (20000000, 7), (31000000, 1), (40000000, 2), (55000000, 4), (60000000, 2), (100000000, 1)]

Creating a list called 'cost_freq' with the cost of the R-rated Dram movies and creating another list called 'cost_freq_amount' with the frequency of the values in 'cost_frq' list.

In [223]:
cost_freq = []
cost_freq_amount = []
for i in freq_one: 
    cost_freq_amount.append(i[1])
    cost_freq.append("${:,.0f}".format(i[0]))

The 'cost_freq' list.

In [224]:
print(cost_freq)#showing the cost_freq list
['$100,000', '$1,000,000', '$2,000,000', '$3,000,000', '$4,000,000', '$5,000,000', '$10,000,000', '$12,000,000', '$13,000,000', '$20,000,000', '$31,000,000', '$40,000,000', '$55,000,000', '$60,000,000', '$100,000,000']

Checking the number of elements in the 'cost_freq' list.

In [783]:
len(cost_freq)
Out[783]:
15

The 'cost_freq_amount' list.

In [225]:
print(cost_freq_amount)#showing the cost_freq_amount list
[3, 3, 8, 4, 2, 3, 8, 4, 3, 7, 1, 2, 4, 2, 1]

Checking the number of elements in the 'cost_freq_amount' list.

In [784]:
len(cost_freq_amount)
Out[784]:
15

Creating a Frequency Distribution Table called 'freq_dis', of all the expenese spent on all of the R-rated movies.

In [294]:
freq_dis = pd.DataFrame({"Amount of Budget (x)":cost_freq,
                                 "Frequency (f)":cost_freq_amount})

The 'freq_dis' table. (this table is interactive)

In [425]:
freq_dis
Out[425]:
Amount of Budget (x) Frequency (f)
Loading... (need help?)

Getting the Upper Values and Lower Values of all the expenese spent on all of the R-rated movies, for the Cumulative Frequency Distribution Table.

In [401]:
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
        
a =list(chunks(range(90001, 100000000), 10000000))
a
Out[401]:
[range(90001, 10090001),
 range(10090001, 20090001),
 range(20090001, 30090001),
 range(30090001, 40090001),
 range(40090001, 50090001),
 range(50090001, 60090001),
 range(60090001, 70090001),
 range(70090001, 80090001),
 range(80090001, 90090001),
 range(90090001, 100000000)]

Finalizing the Lower Values for the Cumulative Frequency Distribution Table.

In [228]:
lower_val = ['$90,000','$10,080,001', '$20,080,002', '$30,080,003', '$40,080,004', 
             '$50,080,005', '$60,080,006', '$70,080,007', '$80,080,008', '$90,080,009' ]
print(lower_val)#showing the lower_val list
['$90,000', '$10,080,001', '$20,080,002', '$30,080,003', '$40,080,004', '$50,080,005', '$60,080,006', '$70,080,007', '$80,080,008', '$90,080,009']

Checking the number of elements in the 'lower_val' list.

In [798]:
len(lower_val)
Out[798]:
10

Finalizing the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [229]:
upper_val = ['$10,080,000','$20,080,001','$30,080,002','$40,080,003','$50,080,004',
  '$60,080,005','$70,080,006','$80,080,007','$90,080,008', '$100,080,009']
print(upper_val)#showing the upper_val list
['$10,080,000', '$20,080,001', '$30,080,002', '$40,080,003', '$50,080,004', '$60,080,005', '$70,080,006', '$80,080,007', '$90,080,008', '$100,080,009']

Checking the number of elements in the 'upper_val' list.

In [799]:
len(upper_val)
Out[799]:
10

Getting the Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [230]:
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
for i in cost: 
    if 90000 <= i <= 10080000:
        count1+=1
    if 10080001 <= i <= 20080001:
        count2+=1
    if 20080002 <= i <= 30080002:
        count3+=1
    if 30080003 <= i <= 40080003:
        count4+=1
    if 40080004 <= i <= 50080004:
        count5+=1
    if 50080005 <= i <= 60080005:
        count6+=1
    if 60080006 <= i <= 70080006:
        count7+=1
    if 70080007 <= i <= 80080007:
        count8+=1
    if 80080008 <= i <= 90080008:
        count9+=1
    if 90080009 <= i <= 100080009:
        count10+=1

freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10]
print(freq_amount)#showing the freq_amount list
[30, 11, 4, 3, 0, 5, 1, 0, 0, 1]

Checking the number of elements in the 'freq_amount' list.

In [800]:
len(freq_amount)
Out[800]:
10

Getting the Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [231]:
freq_amount_percent_demo = [count1/55*100,count2/55*100,count3/55*100,count4/55*100,
                       count5/55*100,count6/55*100,count7/55*100,count8/55*100,
                       count9/55*100,count10/55*100]
freq_amount_percent_demo1 = [55,20,7,5,0,9,2,0,0,2]
print(freq_amount_percent_demo1)#showing the freq_amount_percent_demo1 list
[55, 20, 7, 5, 0, 9, 2, 0, 0, 2]

Checking the number of elements in the 'freq_amount_percent_demo1' list.

In [801]:
len(freq_amount_percent_demo1)
Out[801]:
10

Turning the integer in the freq_amount_percent_demo1 list into a string with the percentage symbol.

In [232]:
freq_amount_percent = []
for i in freq_amount_percent_demo1:
    freq_amount_percent.append("{:}%".format(i))
print(freq_amount_percent)#showing the freq_amount_percent list
['55%', '20%', '7%', '5%', '0%', '9%', '2%', '0%', '0%', '2%']

Checking the number of elements in the 'freq_amount_percent' list.

In [803]:
len(freq_amount_percent)
Out[803]:
10

The Cumulative Function to get the cumulative sum of a list.

In [233]:
def Cumulative(lists):
    cu_list = []
    length = len(lists)
    cu_list = [sum(lists[0:x:1]) for x in range(0, length+1)]
    return cu_list[1:]

Getting the Cumulative Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [234]:
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[30, 41, 45, 48, 48, 53, 54, 54, 54, 55]

Checking the number of elements in the 'freq_cumulative_amount' list.

In [804]:
len(freq_cumulative_amount)
Out[804]:
10

Getting the Cumulative Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [235]:
freq_cumulative_percent_demo = Cumulative(freq_amount_percent_demo1)
print(freq_cumulative_percent_demo)#showing the freq_cumulative_percent_demo list
[55, 75, 82, 87, 87, 96, 98, 98, 98, 100]

Checking the number of elements in the 'freq_cumulative_percent_demo' list.

In [805]:
len(freq_cumulative_percent_demo)
Out[805]:
10

Turning the integer in the freq_cumulative_percent_demo list into a string with the percentage symbol.

In [236]:
freq_cumulative_percent = []
for i in freq_cumulative_percent_demo:
    freq_cumulative_percent.append("{:}%".format(i))
print(freq_cumulative_percent)#showing the freq_cumulative_percent list
['55%', '75%', '82%', '87%', '87%', '96%', '98%', '98%', '98%', '100%']

Checking the number of elements in the 'freq_cumulative_percent' list.

In [806]:
len(freq_cumulative_percent)
Out[806]:
10

Creating the Cumulative Frequency Distribution Table of all the expenses spent of all the R-rated movies, uding the neccessary virables.

In [237]:
freq_cum_dis = pd.DataFrame({"Lower\nValue":lower_val,
                             "Upper\nValue":upper_val,
                             "Frequency (f)":freq_amount,
                             "Percentage (%)":freq_amount_percent,
                            "Cumulative\nFrequency":freq_cumulative_amount,
                            "Cumulative\nPercentage":freq_cumulative_percent})

The 'freq_cum_dis' table. (this table is interactive)

In [426]:
freq_cum_dis
Out[426]:
Lower Value Upper Value Frequency (f) Percentage (%) Cumulative Frequency Cumulative Percentage
Loading... (need help?)

Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.

In [239]:
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
for i in cost: 
    if i < 10000000:
        count1+=1
    if 10000000 <= i < 20000000:
        count2+=1
    if 20000000 <= i < 30000000:
        count3+=1
    if 30000000 <= i < 40000000:
        count4+=1
    if 40000000 <= i < 50000000:
        count5+=1
    if 50000000 <= i < 60000000:
        count6+=1
    if 60000000 <= i < 70000000:
        count7+=1
    if 70000000 <= i < 80000000:
        count8+=1
    if 80000000 <= i < 90000000:
        count9+=1
    if 90000000 <= i <= 100000000:
        count10+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10]
print(freq_amount)#showing the freq_amount list
[29, 9, 7, 2, 1, 4, 2, 0, 0, 1]

Checking the number of elements in the 'freq_amount' list.

In [815]:
len(freq_amount)
Out[815]:
10

Getting the Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [240]:
cum_rel_freq_demo = []
for i in freq_amount:cum_rel_freq_demo.append(i/55*100)
cum_rel_freq_demo1 = [53,16,13,3,2,7,4,0,0,2]
print(cum_rel_freq_demo1)#showing the cum_rel_freq_demo1 list
[53, 16, 13, 3, 2, 7, 4, 0, 0, 2]

Checking the number of elements in the 'cum_rel_freq_demo1' list.

In [816]:
len(cum_rel_freq_demo1)
Out[816]:
10

Getting the Cumulative Relative Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [241]:
cum_rel_freq_demo2 = Cumulative(cum_rel_freq_demo1)
print(cum_rel_freq_demo2)#showing the cum_rel_freq_demo2 list
[53, 69, 82, 85, 87, 94, 98, 98, 98, 100]

Checking the number of elements in the 'cum_rel_freq_demo2' list.

In [817]:
len(cum_rel_freq_demo2)
Out[817]:
10

Turning the integer in the cum_rel_freq_demo2 list into a string with the percentage symbol.

In [242]:
cum_rel_freq_percent = []
for i in cum_rel_freq_demo2:
    cum_rel_freq_percent.append("{:}%".format(i))
print(cum_rel_freq_percent)#showing the cum_rel_freq_percent list
['53%', '69%', '82%', '85%', '87%', '94%', '98%', '98%', '98%', '100%']

Checking the number of elements in the 'cum_rel_freq_percent' list.

In [818]:
len(cum_rel_freq_percent)
Out[818]:
10

Getting the Cumulative Frequency of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [243]:
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[29, 38, 45, 47, 48, 52, 54, 54, 54, 55]

Checking the number of elements in the 'freq_cumulative_amount' list.

In [819]:
len(freq_cumulative_amount)
Out[819]:
10

Finalizing the Intervals for the Cumulative Relative Frequency Distribution Table.

In [244]:
intervals_cum = [ '< $10 Million','10 to < $20 Million','20 to < $30 Million',
                 '30 to < $40 Million',
 '40 to < $50 Miilion', '50 to < $60 Miilion', '60 to < $70 Miilion',
                 '70 to < $80 Miilion',
 '80 to < $90 Miilion','>= $100 Miilion']
print(intervals_cum)#showing the intervals_cum list
['< $10 Million', '10 to < $20 Million', '20 to < $30 Million', '30 to < $40 Million', '40 to < $50 Miilion', '50 to < $60 Miilion', '60 to < $70 Miilion', '70 to < $80 Miilion', '80 to < $90 Miilion', '>= $100 Miilion']

Checking the number of elements in the 'intervals_cum' list.

In [820]:
len(intervals_cum)
Out[820]:
10

Creating the Cumulative Relative Frequency Distribution Table of all the expenses spent of all the R-rated movies, uding the neccessary virables.

In [245]:
cum_rel_freq = pd.DataFrame({"Amount of Budget":intervals_cum,
                             "Frequency (f)":freq_amount,
                             "Cumulative Frequency":freq_cumulative_amount,
                             "Cumelative Relative Frequency Percentage":cum_rel_freq_percent,
                            })

The 'cum_rel_freq' table. (this table is interactive)

In [427]:
cum_rel_freq
Out[427]:
Amount of Budget Frequency (f) Cumulative Frequency Cumelative Relative Frequency Percentage
Loading... (need help?)

Visualizing The Normal Distribution of the all the expenese spent on all of the R-rated movies.

In [400]:
means = '16,455,866'
std = '20,757,148'

def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-70, 70)
    s = [21]
    m = [16]
    c = ['#ff5500']


    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")

   
    plt.xlim(-70, 70)
    plt.ylim(0, .2)
    plt.legend(fontsize=11)
    plt.title('Variability of Cost of R-rated Movies, Normal\n Distribution, Mean =16.4 million, StDev=21 million',fontsize=14)
    plt.xlabel("Cost of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.grid(False)
    plt.savefig('variability_cost_r',bbox_inches='tight',facecolor='white', transparent=False)
    plt.show()
   
    
if __name__ == '__main__':
   main()

Visualizing The Variance of the all the expenese spent on all of the R-rated movies.

In [399]:
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
import matplotlib.patches as mpatches
plt.ylabel('Cost of R-rated Movies',fontsize=14)
plt.xlabel('Ranking of Values',fontsize=14)
plt.title("The Variance of all the Budegt\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title 
plt.scatter(x=df_cost_r.index, y=df_cost_r['Cost'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_cost_r['Cost'].mean(), xmin=0, xmax=55, color='blue') # Mean line
plt.grid(False)
plt.annotate('CL - center line (Arithmetic Mean)',
            xy=(176, 102),
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('Data pionts',
            xy=(105, 180),
            color ='black',
            fontsize=11,
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',)

plt.savefig('variance_cost_r',bbox_inches='tight',facecolor='white', transparent=False)
plt.show()# Telling matplotlib to show the chart

Visualizing The Variance using Two Standard Deviation of the all the expenese spent on all of the R-rated movies.

In [398]:
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('Cost of R-rated Movies',fontsize=14)
plt.xlabel('Position of Values',fontsize=14)
plt.title("The Variance of all the Budegt\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title 
plt.scatter(x=df_cost_r.index, y=df_cost_r['Cost'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_cost_r['Cost'].mean(), xmin=0, xmax=55, color='blue') # Mean line

for std_int in [-3, -2, -1, 1, 2, 3]: # Going through different stds from the mean
    standard_deviation = df_cost_r['Cost'].mean() + df_cost_r['Cost'].std()*std_int
    if std_int in [1,2,-1,-2]:
        plt.hlines(y=standard_deviation,
               xmin=0,
               xmax=55,
               linestyles='dashed',
               colors='green'); # 1 std above
    
    if std_int ==-3:
        plt.hlines(y=standard_deviation,
               xmin=0,
               xmax=55,
               colors='red',); # 1 std above
    if std_int == +3:
        plt.hlines(y=standard_deviation,
               xmin=0,
               xmax=55,
               colors='red'); # 1 std above
   
    # Giving labels to the lines we just drew
    #plt.text(y=standard_deviation + 2, x=-10, s=std_int, ha='center')
    plt.grid(False)
    

plt.annotate('UCL - upper control limit',
            xy=(84, 238),
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('LCL - lower control limit',
            xy=(84, 70),
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('+3 SD',
            xy=(355, 240),
            color ='purple',
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('+2 SD',
            xy=(355, 210),
            color ='purple',
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('+1 SD',
            xy=(355, 180),
            color ='purple',
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('CL - center line (Arithmetic Mean)',
            xy=(176, 152),
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('-3 SD',
            xy=(355, 70),
            color ='purple',
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('-2 SD',
            xy=(355, 98),
            color ='purple',
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.annotate('-1 SD',
            xy=(355, 125),
            color ='purple',
            xycoords='figure pixels',
            horizontalalignment='left',
            verticalalignment='top',
            fontsize=11,)
plt.savefig('variance_std_cost_r',bbox_inches='tight',facecolor='white', transparent=False)

Visualizing The Pearson’s Coefficient of Skewness of the all the expenese spent on all of the R-rated movies.

In [397]:
import matplotlib.pyplot as plt
# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=cost, bins='auto', color='#ff5500',
                            alpha=0.7, rwidth=0.85)
plt.grid(False)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('Cost of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
plt.title('The Pearson’s Coefficient of Skewness for the Budget\n of all R-rated movies is 1.07 (n=55)',fontsize=14)
plt.savefig('skew_cost_r.png', bbox_inches='tight',facecolor='white', transparent=False)

Visualizing The Comparison of Mode, Median and Mean of the all the expenese spent on all of the R-rated movies.

In [456]:
# An "interface" to matplotlib.axes.Axes.hist() method
median_cost = statistics.median(cost)
mean_cost = 16455866
mode_cost = statistics.mode(cost)
n, bins, patches = plt.hist(x=cost, bins='auto', color='#ff5500',
                            alpha=0.2, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
names = ["median", "mean", "mode"]
colors = ['green', 'red', 'blue']
measurements = [median_cost, mean_cost, mode_cost]
for measurement, name, color in zip(measurements, names, colors):
    plt.axvline(x=measurement,  linestyle='--', linewidth=2.5, label='{0} at {1}'.format(name, measurement), c=color)
plt.legend(fontsize=10);

plt.xlabel('Cost of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
plt.title('Comparison of Mode, Median and Mean in the\n Distribution of the Cost of all the R-rated Drama movies',fontsize=14)
plt.savefig('skewness2_cost_r', bbox_inches='tight')

Visualizing The Chebyshevs Theorem of the all the expenese spent on all of the R-rated movies.

In [425]:
means = '16,455,866'
std = '20,757,148'
means1 = 16
std1 = 20
def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():

    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-90, 90)
    s = [21]
    m = [16]
    c = ['#ff5500']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
    
    x = np.linspace(means1 - std1*2, means1 + std1*2)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x, y, alpha=0.5, color='#ff5500')
    ax.annotate('at least 75%\n at least 41 obs', xy=(50,0.0075), xytext=(50,0.0125),
            arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
    
   
    plt.xlim(-100, 100)
    plt.ylim(0, .02)
    plt.legend(fontsize=10)
    plt.title('Chebyshevs Theorem on the Budget \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
    plt.xlabel("Cost of R-rated Movies",fontsize=14)
    plt.ylabel("Density", fontsize=14)
    plt.savefig('cheb_cost_r',bbox_inches='tight')
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The Chebyshevs Theorem of the all the expenese spent on all of the R-rated movies.

In [426]:
means = '16,455,866'
std = '20,757,148'
means1 = 16
std1 = 20
def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-90, 90)
    s = [21]
    m = [16]
    c = ['#ff5500']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
    
    x = np.linspace(57, 78)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x, y, alpha=0.5, color='#ff5500')
    
    x1 = np.linspace(-26, -40)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x1, y, alpha=0.5, color='#ff5500')

    ax.annotate('at least 13.9%\n at leat 8 obs', xy=(70,0.0025), xytext=(50,0.0075),
            arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
   
    plt.xlim(-100, 100)
    plt.ylim(0, .0199)
    plt.legend(fontsize=10)
    plt.title('Chebyshevs Theorem on the Budget \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
    plt.xlabel("Cost of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.savefig('cheb2_cost_r',bbox_inches='tight')
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The KDE and Jittered plot of the all the expenese spent on all of the R-rated movies.

In [411]:
import seaborn as sns
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_cost_r, color='#ff5500');
sns.violinplot( data=df_cost_r,inner=None,color='0.8').set(title='KDE and Jittered strip plot\n on the budget of the r-rated movies')
plt.savefig('violin_cost_r')
plt.show()

Visualizing The KDE and Swarm plot of the all the expenese spent on all of the R-rated movies.

In [673]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.swarmplot(data=df_cost_r, color='#ff5500');
sns.violinplot( data=df_cost_r, color='0.8', inner=None, aplha=.2).set(title='KDE and swarm plot\n on the budget of the r-rated movies')
#sns.despine()
plt.savefig('violin2_cost_r')
plt.show()
C:\Users\rutho\AppData\Local\Temp/ipykernel_8212/229341393.py:8: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.show()

Visualizing The KDE and Rug plot of the all the expenese spent on all of the R-rated movies.

In [674]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_cost_r, color='#ff5500', jitter=False)
sns.violinplot(data=df_cost_r,  split=True,inner=None,
      scale="count", color='0.8', alpha=.1).set(title='KDE and rug plot\n on the budget of the r-rated movies')
#sns.despine()
plt.savefig('violin3_cost_r')
plt.show()
C:\Users\rutho\AppData\Local\Temp/ipykernel_8212/3008613242.py:9: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.show()

Visualizing The KDE and Box plot of the all the expenese spent on all of the R-rated movies.

In [833]:
sns.set(font_scale=.85)
plt.gcf().set_size_inches(4.2, 4)
sns.set_style("whitegrid")
ax = sns.violinplot( data=df_cost_r,color='#ff5500',fill=True,width=0.6,scale="width", inner=None)
sns.boxplot( data=df_cost_r, color='#ff5500', width=0.3,  ax=ax).set(title='KDE and Box plot\n on the budget of the r-rated movies')
for violin, alpha in zip(ax.collections[::2], [0.3]):violin.set_alpha(alpha)
plt.savefig('violin4_cost_r')

Visualizing The Kernel Density Estimation of the all the expenese spent on all of the R-rated movies.

In [671]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.displot(df_cost_r, x="Cost",color='#ff5500', kind="kde",
           fill=True).set(title='KDE on the Budget of the R-rated Drama Movies')
plt.xlim(0, None)
plt.savefig('skewness3_cost_r')
plt.show()
C:\Users\rutho\AppData\Local\Temp/ipykernel_8212/292974908.py:8: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
  plt.show()
<Figure size 417.6x432 with 0 Axes>

The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Micro-Budgets.

In [692]:
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[0])#4%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.036)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli_dist_cost_r',bbox_inches='tight')
plt.show()

The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Low-Budgets.

In [693]:
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[1])#65%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.65)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli1_dist_cost_r',bbox_inches='tight')
plt.show()

The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Mid-Budgets.

In [694]:
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[2])#18%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.18)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli2_dist_cost_r',bbox_inches='tight')
plt.show()

The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are High-Budgets.

In [695]:
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[3])#13%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.13)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli3_dist_cost_r',bbox_inches='tight')
plt.show()

Q-Q plot or Quantile plot for checking the distribution of the all the expenese spent on all of the R-rated movies.

In [842]:
from scipy import stats
import matplotlib.style as style
plt.figure(figsize=(5,3))
stats.probplot(cost,plot=plt)
#ax = fig.subplots()
#ax = fig.add_subplot()
#fig, ax = plt.subplots()
#ax.get_lines()[0].set_markerfacecolor('C0')
plt.title("Distribution Plot of the Budgets of R-rated Drama Movies",fontsize=10)
plt.xlabel('Theoretical Quantiles',fontsize=10)
plt.ylabel('Ordered Values',fontsize=10)
plt.savefig('Probab_plot_r3',bbox_inches='tight',facecolor='white', transparent=False)
plt.show()

This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of the Bernoulli Disbribution on each Budgey category (ranging from $100,00 to $50 Million) , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript below. (the graph below is interactive, you can hover over the pie chart)

In [428]:
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<figure class="highcharts-figure">
    <div id="containerr"></div>
</figure>

This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .

In [429]:
%%js
Highcharts.chart('containerr', {
    chart: {
        width:950,
        height:500,
        styledMode: false,
        plotBackgroundColor: null,
        plotBorderWidth: null,
        plotShadow: false,
        type: 'pie'
       
    },
    title: {
        text: 'The Bernoulli Distribution on the Budgets of R-rated Drama Movies'
    },
    tooltip: {
        pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>'
    },
    legend: {
        enabled: true,
        verticalAlign: 'bottom',
        symbolRadius: 20,
        reversed: true
    },
    accessibility: {
        point: {
            valueSuffix: '%'
        }
    },
    plotOptions: {
        pie: {
            allowPointSelect: true,
            cursor: 'pointer',
            dataLabels: {
                enabled: true,
                format: '<b>{point.name}</b>: {point.percentage:.1f} %'
                
            },
            showInLegend: true
        }
    },
    series: [{
        name: 'System Rating',
        colorByPoint: true,
        colors: ['#ba450b','#ff5500','#e8946a','#f0c3ad'],
        data: [{
            name:'Micro Budget: <br>$0 to $100,000',
            y: 3
        }, {
            name: 'Low Budget: <br>$100,000 to $15 Million',
            y: 35
        }, {
            name: 'Mid Budget: <br>$15 Million to $50 Million',
            y: 10,
            sliced: true,
            selected: true
        }, {
            name: 'High Budget: <br>$50 Million+',
            y: 7
        }]
    }]
});

This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of the Bernoulli Disbribution on each Budget category (and comparing each sub-category to the entire dataframe and to the main category) , within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript below. (the graph below is interactive, you can hover over the column chart)

In [430]:
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>

<table><tr><th></th><th></th><th></th><th></th></tr><tr><th></th><th></th><th></th><th></th></tr></th><th></th></tr>
    <tr>
    <td><div id="container"></div><td>
    <td><div id="container1"></div><td>
    <td><div id="container2"></div><td>
    </tr>
</table>

This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript and HTML .

In [431]:
%%js
Highcharts.chart('container', {
    chart: {
        type: 'column',
    },
    title: {
        text: 'The Bernoulli Distribution on <br>the Sub-groups in the Low-Budget Category'
    },
    subtitle: {
        text: 'R-rated Drama Movies'
    },
    xAxis: {
        type: 'category',
        labels: {
            style: {
                fontSize: '13px',
                fontFamily: 'Verdana, sans-serif'
            }
        }
    },
    yAxis: {
        min: 0,
        title: {
            text: 'Probability (%)'
        }
    },
    legend: {
        enabled: true,
        verticalAlign: 'bottom',
        symbolRadius: 20,
        reversed: true
    },
    plotOptions: {
        series: {
            borderWidth: 0,
            dataLabels: {
                enabled: true,
                format: '{point.y:.0f}%'
            }
        }
    },
    tooltip: {
        pointFormat: 'Percentage: <b>{point.y:.0f} %</b>'
    },
    series: [{
        color: '#ff5500',
        name: 'Compared to the Sub-groups within the Low-Budget category',
        data: [
            ['$1 Million to $5 Million', 57],
            ['$5 Million to $10 Million', 20],
            ['$10 Million to $15 Million', 23]
        ]
    }, {
        color: '#ba450b',
        name: 'Compared to the entire Data set',
        data: [
            ['$1 Million to $5 Million', 36],
            ['$5 Million to $10 Million', 13],
            ['$10 Million to $15 Million', 15]
        ]
    }]
});

This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript and HTML .

In [432]:
%%js
Highcharts.chart('container1', {
    chart: {
        type: 'column',
    },
    title: {
        text: 'The Bernoulli Distribution on <br>the Sub-groups in the Mid-Budget Category'
    },
    subtitle: {
        text: 'R-rated Drama Movies'
    },
    xAxis: {
        type: 'category',
        labels: {
            style: {
                fontSize: '13px',
                fontFamily: 'Verdana, sans-serif'
            }
        }
    },
    yAxis: {
        min: 0,
        title: {
            text: 'Probability (%)'
        }
    },
    legend: {
        enabled: true,
        verticalAlign: 'bottom',
        symbolRadius: 20,
        reversed: true
    },
    plotOptions: {
        series: {
            borderWidth: 0,
            dataLabels: {
                enabled: true,
                format: '{point.y:.0f}%'
            }
        }
    },
    tooltip: {
        pointFormat: 'Percentage: <b>{point.y:.0f} %</b>'
    },
    series: [{
        color: '#e8946a',
        name: 'Compared to the Sub-groups within the Mid-Budget category',
        data: [
            ['$15 Million to $20 Million', 30],
            ['$20 Million to $30 Million', 40],
            ['$30 Million to $50 Million', 30]
        ]
    }, {
        color: '#ba450b',
        name: 'Compared to the entire Data set',
        data: [
            ['$15 Million to $20 Million', 6],
            ['$20 Million to $30 Million', 7],
            ['$30 Million to $50 Million', 6]
        ]
    }]
});

This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript and HTML .

In [433]:
%%js
Highcharts.chart('container2', {
    chart: {
        type: 'column',
        
    },
    title: {
        text: 'The Bernoulli Distribution on <br>the Sub-groups in the High-Budget Category'
    },
    subtitle: {
        text: 'R-rated Drama Movies'
    },
    xAxis: {
        type: 'category',
        labels: {
            style: {
                fontSize: '13px',
                fontFamily: 'Verdana, sans-serif'
            }
        }
    },
    yAxis: {
        min: 0,
        title: {
            text: 'Probability (%)'
        }
    },
    legend: {
        enabled: true,
        verticalAlign: 'bottom',
        symbolRadius: 20,
        reversed: true
    },
    plotOptions: {
        series: {
            borderWidth: 0,
            dataLabels: {
                enabled: true,
                format: '{point.y:.0f}%'
            }
        }
    },
    tooltip: {
        pointFormat: 'Percentage: <b>{point.y:.0f} %</b>'
    },
    series: [{
        color: '#f0c3ad',
        name: 'Compared to the Sub-groups within the High-Budget category',
        data: [
            ['$50 Million to $60 Million', 72],
            ['$60 Million to $70 Million', 14],
            ['$90 Million to $100 Million', 14]
        ]
    }, {
        color: '#ba450b',
        name: 'Compared to the entire Data set',
        data: [
            ['$50 Million to $60 Million', 9],
            ['$60 Million to $70 Million', 2],
            ['$90 Million to $100 Million', 2]
        ]
    }]
});

Styling the first portion of the Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.

In [883]:
freq_dis_cost_r = freq_dis[:8].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},
                            ])

Saving the freq_dis_cost_r dataframe to the freq_dis_cost_r.png file as an image to be used for the analysis later on.

In [893]:
dfi.export(freq_dis_cost_r, 'freq_dis_cost_r.png')

The 'freq_dis_cost_r' datarame.

Styling the second portion of the Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.

In [885]:
freq1_dis_cost_r = freq_dis[8:].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\885110614.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq1_dis_cost_r = freq_dis[8:].style.hide_index()\

Saving the freq1_dis_cost_r dataframe to the freq1_dis_cost_r.png file as an image to be used for the analysis later on.

In [886]:
dfi.export(freq1_dis_cost_r, 'freq1_dis_cost_r.png')

The 'freq1_dis_cost_r' datarame.

Styling the Cumulative Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.

In [887]:
freq_cum_dis_cost_r = freq_cum_dis.style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black"),
                                      ("font-size", "10pt")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\4059934024.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_cum_dis_cost_r = freq_cum_dis.style.hide_index()\

Saving the freq_cum_dis_cost_r dataframe to the freq_cum_dis_cost_r.png file as an image to be used for the analysis later on.

In [888]:
dfi.export(freq_cum_dis_cost_r, 'freq_cum_dis_cost_r.png')

The 'freq_cum_dis_cost_r' datarame.

Styling the Cumelative Relative Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.

In [891]:
cum_rel_freq_cost_r = cum_rel_freq.style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black"),
                                      ("font-size", "10pt")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\348482533.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  cum_rel_freq_cost_r = cum_rel_freq.style.hide_index()\

Saving the cum_rel_freq_cost_r dataframe to the cum_rel_freq_cost_r.png file as an image to be used for the analysis later on.

In [892]:
dfi.export(cum_rel_freq_cost_r, 'cum_rel_freq_cost_r.png')

The 'cum_rel_freq_cost_r' datarame.

Cumelative Relative Frequency Distribution Line Plot of the all the expenese spent on all of the R-rated movies.

In [486]:
# Set up the axes and figure
fig, ax = plt.subplots()
amount = [10000000, 20000000, 30000000, 40000000, 50000000, 60000000, 70000000, 80000000, 
       90000000, 100000000]
freq = [53, 69, 82, 85, 87, 94, 98, 98, 98, 100]
x = ['$10M to < $20M','$30M to < $40M','$50M to < $60M','$70M to < $80M','>= $100M']
plt.plot( amount, freq ,color='#ff5500', marker='o')
plt.title('Cumulative relative frequency (%) of \n the amount of budget spent on R-rated movies', fontsize=14)
plt.xlabel('Amount of Budget', fontsize=14)
plt.ylabel('Cumulative relative frequency (%)', fontsize=14)
plt.grid(True)
#plt.xticks(x, rotation = 45)
plt.subplots_adjust(bottom=spacing)
plt.show()

Getting the ROI generated by all of the R-rated movies.

In [254]:
roi = []
for i in system_rating_r['Return On Investment']:
    i = int(i.replace('$', '').replace(',', ''))
    roi.append(i)
print(roi)#showing the roi list
[349948323, 307567189, 24154026, 326398492, 316350619, 19966854, 82112435, 530998101, 13147416, 129558438, 54735925, 9898681, 8554727, 17017873, 26604054, 8270399, 318266710, 25358392, 23262783, 7859167, 23830713, 31043521, 45178935, 60133905, 12417298, 69233867, 3765283, 12499242, 12636004, 222016, 53273049, 36954520, 17033227, 35669037, 20251930, 14610760, 14131551, 9295324, 8153415, 88390, 4328516, 19282640, 12744931, 15566240, 4438911, 156309, 294448, 2669782, 48766923, 68711836, 14718173, 1851683, 556082, 1500000, 2000000]

Checking the number of elements in the 'roi' list.

In [255]:
len(roi)
Out[255]:
55

Putting the roi of all the R-rated movies into a dtaframe called df_roi_r.

In [256]:
df_roi_r = pd.DataFrame({"ROI":roi})

The 'df_roi_r' dataframe. (this dataframe is interactive)

In [434]:
df_roi_r
Out[434]:
ROI
Loading... (need help?)

Getting the Arithmetic Mean of the all the ROI generated of all of the R-rated movies.

In [419]:
x = statistics.mean(roi)
print("Arithmetic Mean of the ROI for the R-rated movies is:", x)
Arithmetic Mean of the ROI for the R-rated movies is: 59600710.27272727

Getting the Median of the all the ROI generated of all of the R-rated movies.

In [420]:
print("Median of the ROI for the R-rated movies is:", statistics.median(roi))
Median of the ROI for the R-rated movies is: 17017873

Getting the Mode of the all the ROI generated of all of the R-rated movies.

In [421]:
print("Mode of the ROI for the R-rated movies is:",statistics.mode(roi))
Mode of the ROI for the R-rated movies is: 349948323

Getting the Standard Deviation of the all the ROI generated of all of the R-rated movies.

In [422]:
print("Standard deviation of the ROI for the R-rated movies is:", np.std(roi, ddof=1))
Standard deviation of the ROI for the R-rated movies is: 111311472.60911952

Getting the Coefficient of Variation of the all the ROI generated of all of the R-rated movies.

In [423]:
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
print("Coefficient of Variation of the ROI for the R-rated movies is:", cv(roi))
Coefficient of Variation of the ROI for the R-rated movies is: 186.76199008328697

Getting the Pearson’s Coefficient of the all the ROI generated of all of the R-rated movies.

In [424]:
def pearsons(mean, median, standard_deviation):
    skewness = (mean-median)*3/standard_deviation
    return skewness
print("Pearson’s Coefficient of Skewness of the ROI for the R-rated movies is:", 
      pearsons( statistics.mean(roi),statistics.median(roi),np.std(roi, ddof=1))) 
Pearson’s Coefficient of Skewness of the ROI for the R-rated movies is: 1.147667071720293

Getting the Chebyshevs Theroem of the all the ROI generated on all of the R-rated movies.

In [425]:
def chebyshevs(mean, standard_deviation, num_std, previous_p):
    position_std = num_std*standard_deviation
    upper_range = mean - position_std 
    if upper_range < 0: upper_range = 0
    lower_range = position_std + mean
    if num_std == 2: 
        print('At least 75% of the ROI of the r-rated movies ranges from',upper_range,'to',lower_range)
    if num_std == 3: 
        print('At least 13.9% of the ROI of the r-rated movies ranges from',previous_p,'to',lower_range)
chebyshevs(59600710, 111311472, 2, 0)
chebyshevs(59600710, 111311472, 3, 282223654)
At least 75% of the ROI of the r-rated movies ranges from 0 to 282223654
At least 13.9% of the ROI of the r-rated movies ranges from 282223654 to 393535126

Getting the Kurtosis of the all the ROI generated on all of the R-rated movies.

In [426]:
print('Kurtosis of the ROI of the r-rated movies is:',kurtosis(roi, fisher=False))
print('Excess Kurtosis of the ROI of the r-rated movies is:',
      (kurtosis(roi,fisher=False)-3))#leptokurtic
Kurtosis of the ROI of the r-rated movies is: 9.143154378370438
Excess Kurtosis of the ROI of the r-rated movies is: 6.143154378370438

Getting the Arithmetic Mean and the Trimmed Mean of the all the ROI generated on all of the R-rated movies.

In [427]:
print("Arithmetic Mean of the ROI for the R-rated movies is:", statistics.mean(roi))
print('10% Trimmed mean of the ROI of the r-rated movies is:',stats.trim_mean(roi, 0.10))
Arithmetic Mean of the ROI for the R-rated movies is: 59600710.27272727
10% Trimmed mean of the ROI of the r-rated movies is: 31883546.111111112

Rounding the ROI generated of R-rated movies to the nearest million and storing it in a list called 'freq_demo'.

In [258]:
freq_demo = []
for i in roi:
    freq_demo.append((round(i, -6)))
print(freq_demo)#showing the freq_demo list
[350000000, 308000000, 24000000, 326000000, 316000000, 20000000, 82000000, 531000000, 13000000, 130000000, 55000000, 10000000, 9000000, 17000000, 27000000, 8000000, 318000000, 25000000, 23000000, 8000000, 24000000, 31000000, 45000000, 60000000, 12000000, 69000000, 4000000, 12000000, 13000000, 0, 53000000, 37000000, 17000000, 36000000, 20000000, 15000000, 14000000, 9000000, 8000000, 0, 4000000, 19000000, 13000000, 16000000, 4000000, 0, 0, 3000000, 49000000, 69000000, 15000000, 2000000, 1000000, 2000000, 2000000]

Checking the number of elements in the 'freq_demo' list.

In [259]:
len(freq_demo)
Out[259]:
55

Replacing some elements in the 'freq_demo' list with another value.

In [260]:
freq_demo[-9] = 300000
freq_demo[-10] = 200000
freq_demo[-16] = 100000
freq_demo[-26] = 200000
freq_demo[1] = 300000000
freq_demo[4] = 300000000
freq_demo[3] = 350000000
freq_demo[16] = 300000000

Getting the Frequency of the Repeated Values of all the ROI generated of the R-rated Drama movies. Which will be stored in a dictionary called 'freq_demo1'.

In [261]:
freq_demo1 = Counter((freq_demo))
print(freq_demo1)#showing the freq_demo1 list
Counter({300000000: 3, 13000000: 3, 8000000: 3, 4000000: 3, 2000000: 3, 350000000: 2, 24000000: 2, 20000000: 2, 9000000: 2, 17000000: 2, 12000000: 2, 69000000: 2, 200000: 2, 15000000: 2, 82000000: 1, 531000000: 1, 130000000: 1, 55000000: 1, 10000000: 1, 27000000: 1, 25000000: 1, 23000000: 1, 31000000: 1, 45000000: 1, 60000000: 1, 53000000: 1, 37000000: 1, 36000000: 1, 14000000: 1, 100000: 1, 19000000: 1, 16000000: 1, 300000: 1, 3000000: 1, 49000000: 1, 1000000: 1})

Sorting the 'freq_demo1' dictionary in accending order.

In [262]:
freq_one = sorted(freq_demo1.items(), key=lambda i: i[0])
print(freq_one)#showing the freq_one list
[(100000, 1), (200000, 2), (300000, 1), (1000000, 1), (2000000, 3), (3000000, 1), (4000000, 3), (8000000, 3), (9000000, 2), (10000000, 1), (12000000, 2), (13000000, 3), (14000000, 1), (15000000, 2), (16000000, 1), (17000000, 2), (19000000, 1), (20000000, 2), (23000000, 1), (24000000, 2), (25000000, 1), (27000000, 1), (31000000, 1), (36000000, 1), (37000000, 1), (45000000, 1), (49000000, 1), (53000000, 1), (55000000, 1), (60000000, 1), (69000000, 2), (82000000, 1), (130000000, 1), (300000000, 3), (350000000, 2), (531000000, 1)]

Creating a list called 'roi_freq_amount' with the frequency of the values from 'freq_one' list.

In [263]:
roi_freq_amount = []
for i in freq_one: 
    roi_freq_amount.append(i[1])
print(roi_freq_amount)#showing the roi_freq_amount list
[1, 2, 1, 1, 3, 1, 3, 3, 2, 1, 2, 3, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 3, 2, 1]

Checking the number of elements in the 'roi_freq_amount' list.

In [264]:
len(roi_freq_amount)
Out[264]:
36

Creating a list called 'roi_freq' with the cost of the R-rated Dram movies in 'freq_one' list.

In [265]:
roi_freq = []
for i in freq_one:
    roi_freq.append("${:,.0f}".format(i[0]))
print(roi_freq)#showing the roi_freq list
['$100,000', '$200,000', '$300,000', '$1,000,000', '$2,000,000', '$3,000,000', '$4,000,000', '$8,000,000', '$9,000,000', '$10,000,000', '$12,000,000', '$13,000,000', '$14,000,000', '$15,000,000', '$16,000,000', '$17,000,000', '$19,000,000', '$20,000,000', '$23,000,000', '$24,000,000', '$25,000,000', '$27,000,000', '$31,000,000', '$36,000,000', '$37,000,000', '$45,000,000', '$49,000,000', '$53,000,000', '$55,000,000', '$60,000,000', '$69,000,000', '$82,000,000', '$130,000,000', '$300,000,000', '$350,000,000', '$531,000,000']

Checking the number of elements in the 'roi_freq' list.

In [266]:
len(roi_freq)
Out[266]:
36

Creating a Frequency Distribution Table called 'freq_dis', of all the ROI generated on all of the R-rated movies.

In [282]:
freq_dis_roi = pd.DataFrame({"Return On Investment (x)":roi_freq,
                                 "Frequency (f)":roi_freq_amount})

The 'freq_dis' dataframe. (this dataframe is interactive)

In [435]:
freq_dis_roi
Out[435]:
Return On Investment (x) Frequency (f)
Loading... (need help?)

Getting the Upper Values and Lower Values of all the ROI generated on all of the R-rated movies, for the Cumulative Frequency Distribution Table.

In [972]:
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
        
a =list(chunks(range(70000, 350000000), 30000000))
a#showing the a list
Out[972]:
[range(70000, 30070000),
 range(30070000, 60070000),
 range(60070000, 90070000),
 range(90070000, 120070000),
 range(120070000, 150070000),
 range(150070000, 180070000),
 range(180070000, 210070000),
 range(210070000, 240070000),
 range(240070000, 270070000),
 range(270070000, 300070000),
 range(300070000, 330070000),
 range(330070000, 350000000)]
In [269]:
vals = [70000,  30070000, 30070001,  60070001, 60070002,  90070002, 90070003,  120070003,
120070004,  150070004, 150070005,  180070005, 180070006, 210070006,  210070007,240070007, 240070008,
270070008,  270000009, 300070009, 300070010, 330070010, 330070011, 360070011, 360070012, 
        390070012, 390070013, 410070013, 410070014, 440007014, 440007015, 470007015, 470007016, 
        500070016,500070017, 530070017, 530070018, 560070019 ]

Finalizing the Lower Values for the Cumulative Frequency Distribution Table.

In [270]:
lower_vals = []
for i,x in enumerate(vals): 
    if (i%2) == 0:lower_vals.append("${:,.0f}".format(x)) 
print(lower_vals)#showing the lower_vals list
['$70,000', '$30,070,001', '$60,070,002', '$90,070,003', '$120,070,004', '$150,070,005', '$180,070,006', '$210,070,007', '$240,070,008', '$270,000,009', '$300,070,010', '$330,070,011', '$360,070,012', '$390,070,013', '$410,070,014', '$440,007,015', '$470,007,016', '$500,070,017', '$530,070,018']

Checking the number of elements in the 'lower_vals' list.

In [271]:
len(lower_vals)
Out[271]:
19

Finalizing the Upper Values for the Cumulative Frequency Distribution Table.

In [272]:
upper_vals = []
for i,x in enumerate(vals): 
    if (i%2) !=0: upper_vals.append("${:,.0f}".format(x))   
print(upper_vals)#showing the upper_vals list
['$30,070,000', '$60,070,001', '$90,070,002', '$120,070,003', '$150,070,004', '$180,070,005', '$210,070,006', '$240,070,007', '$270,070,008', '$300,070,009', '$330,070,010', '$360,070,011', '$390,070,012', '$410,070,013', '$440,007,014', '$470,007,015', '$500,070,016', '$530,070,017', '$560,070,019']

Checking the number of elements in the 'upper_vals' list.

In [555]:
len(upper_vals)
Out[555]:
19

Getting the Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [273]:
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0 
count13 = 0
count14 = 0
count15 = 0
count16 = 0
count17 = 0
count18 = 0
count19 = 0
for i in roi: 
    if 70000 <= i < 30070000:
        count1+=1
    if 30070001 <= i < 60070001:
        count2+=1
    if 60070002 <= i <= 90070002:
        count3+=1
    if 90070003 <= i <= 120070003:
        count4+=1
    if 120070004 <= i <= 150070004:
        count5+=1
    if 150070005 <= i <= 180070005:
        count6+=1
    if 180070006 <= i <= 210070006:
        count7+=1
    if 210070007 <= i <= 240070007:
        count8+=1
    if 240070008 <= i <= 270070008:
        count9+=1
    if 270070008 <= i <= 300070008:
        count10+=1
    if 300070008 <= i <= 330070008:
        count11+=1
    if 330070008 <= i <= 360070008:
        count12+=1
    if 360070009 <= i <= 390070009:
        count13+=1
    if 390070010 <= i <= 410070010:
        count14+=1
    if 410070011 <= i <= 440070011:
        count15+=1
    if 440070012 <= i <= 470070012:
        count16+=1
    if 470070013 <= i <= 500070013:
        count17+=1
    if 530070014 <= i <= 530070015:
        count18+=1
    if 530070016 <= i <= 570070017:
        count19+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,
               count8,count9,count10,count11,count12, 
               count13,count14,count15,count16,count17,count18,count19]
print(freq_amount)#showing the freq_amount list
[37, 7, 4, 0, 1, 0, 0, 0, 0, 0, 4, 1, 0, 0, 0, 0, 0, 0, 1]

Checking the number of elements in the 'freq_amount' list.

In [274]:
len(freq_amount)
Out[274]:
19

Getting the Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [275]:
freq_amount_percent_demo = [count1/55*100,count2/55*100,count3/55*100,count4/55*100,
                       count5/55*100,count6/55*100,count7/55*100,count8/55*100,
                       count9/55*100, count10/55*100, count11/55*100, count12/55*100,
                count13/55*100,count14/55*100,count15/55*100,count16/55*100,
                       count17/55*100,count18/55*100,count19/55*100,]

freq_amount_percent_demo1 = [67, 13, 8, 0, 2, 0, 0, 0, 0, 0, 7, 2,
                            0, 0, 0, 0, 0, 0, 2]
print(freq_amount_percent_demo1)#showing the freq_amount_percent_demo1 list
[67, 13, 8, 0, 2, 0, 0, 0, 0, 0, 7, 2, 0, 0, 0, 0, 0, 0, 2]

Checking the number of elements in the 'freq_amount_percent_demo1' list.

In [976]:
len(freq_amount_percent_demo1)
Out[976]:
19

Turning the integer in the freq_amount_percent_demo1 list into a string with the percentage symbol.

In [276]:
freq_amount_percent = []
for i in freq_amount_percent_demo1:
    freq_amount_percent.append("{:}%".format(i))
print(freq_amount_percent)#showing the freq_amount_percent list
['67%', '13%', '8%', '0%', '2%', '0%', '0%', '0%', '0%', '0%', '7%', '2%', '0%', '0%', '0%', '0%', '0%', '0%', '2%']

Checking the number of elements in the 'freq_amount_percent' list.

In [979]:
len(freq_amount_percent)
Out[979]:
19

Getting the Cumulative Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [277]:
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[37, 44, 48, 48, 49, 49, 49, 49, 49, 49, 53, 54, 54, 54, 54, 54, 54, 54, 55]

Checking the number of elements in the 'freq_cumulative_amount' list.

In [982]:
len(freq_cumulative_amount)
Out[982]:
19

Getting the Cumulative Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [278]:
freq_cumulative_percent_demo = Cumulative(freq_amount_percent_demo1)
print(freq_cumulative_percent_demo)#showing the freq_cumulative_percent_demo list
[67, 80, 88, 88, 90, 90, 90, 90, 90, 90, 97, 99, 99, 99, 99, 99, 99, 99, 101]

Checking the number of elements in the 'freq_cumulative_percent_demo' list.

In [983]:
len(freq_cumulative_percent_demo)
Out[983]:
19

Turning the integer in the freq_cumulative_percent_demo list into a string with the percentage symbol.

In [279]:
freq_cumulative_percent = []
for i in freq_cumulative_percent_demo:
    freq_cumulative_percent.append("{:}%".format(i))
print(freq_cumulative_percent)#showing the freq_cumulative_percent list
['67%', '80%', '88%', '88%', '90%', '90%', '90%', '90%', '90%', '90%', '97%', '99%', '99%', '99%', '99%', '99%', '99%', '99%', '101%']

Checking the number of elements in the 'freq_cumulative_percent' list.

In [985]:
len(freq_cumulative_percent)
Out[985]:
19

Creating the Cumulative Frequency Distribution Table of all the ROI generated of all the R-rated movies, uding the neccessary virables.

In [280]:
freq_cum_dis1 = pd.DataFrame({"Lower\nValue":lower_vals,
                             "Upper\nValue":upper_vals,
                             "Frequency (f)":freq_amount,
                             "Percentage (%)":freq_amount_percent,
                            "Cumulative\nFrequency":freq_cumulative_amount,
                            "Cumulative\nPercentage":freq_cumulative_percent})

The 'freq_cum_dis1' table. (this table is interactive)

In [436]:
freq_cum_dis1
Out[436]:
Lower Value Upper Value Frequency (f) Percentage (%) Cumulative Frequency Cumulative Percentage
Loading... (need help?)

Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.

In [284]:
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
for i in roi: 
    if i < 30000000:
        count1+=1
    if 30000000 <= i < 60000000:
        count2+=1
    if 60000000 <= i < 90000000:
        count3+=1
    if 90000000 <= i < 120000000:
        count4+=1
    if 120000000 <= i < 150000000:
        count5+=1
    if 150000000 <= i < 180000000:
        count6+=1
    if 180000000 <= i < 210000000:
        count7+=1
    if 210000000 <= i < 240000000:
        count8+=1
    if 240000000 <= i < 270000000:
        count9+=1
    if 270000000 <= i < 300000000:
        count10+=1
    if 300000000 <= i <= 330000000:
        count11+=1
    if 330000000 <= i <= 360000000:
        count12+=1
    if i > 360000000:
        count13+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,
               count8,count9,count10,count11,count12, count13]
print(freq_amount)#showing the freq_amount list
[37, 7, 4, 0, 1, 0, 0, 0, 0, 0, 4, 1, 1]

Checking the number of elements in the 'freq_amount' list.

In [1033]:
len(freq_amount)
Out[1033]:
13

Getting the Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [285]:
cum_rel_freq_demo = []
for i in freq_amount:cum_rel_freq_demo.append(i/55*100)
cum_rel_freq_demo1 = [67,13,7,0,2,0,0,0,0,0,7,2,2]
print(cum_rel_freq_demo1)#showing the cum_rel_freq_demo1 list
[67, 13, 7, 0, 2, 0, 0, 0, 0, 0, 7, 2, 2]

Checking the number of elements in the 'cum_rel_freq_demo1' list.

In [1034]:
len(cum_rel_freq_demo1)
Out[1034]:
13

Getting the Cumulative Relative Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [286]:
cum_rel_freq_demo2 = Cumulative(cum_rel_freq_demo1)
print(cum_rel_freq_demo2)#showing the cum_rel_freq_demo2 list
[67, 80, 87, 87, 89, 89, 89, 89, 89, 89, 96, 98, 100]

Checking the number of elements in the 'cum_rel_freq_demo2' list.

In [1035]:
len(cum_rel_freq_demo2)
Out[1035]:
13

Turning the integer in the cum_rel_freq_demo2 list into a string with the percentage symbol.

In [287]:
cum_rel_freq_percent = []
for i in cum_rel_freq_demo2:
    cum_rel_freq_percent.append("{:}%".format(i))
print(cum_rel_freq_percent)#showing the cum_rel_freq_percent list
['67%', '80%', '87%', '87%', '89%', '89%', '89%', '89%', '89%', '89%', '96%', '98%', '100%']

Checking the number of elements in the 'cum_rel_freq_percent' list.

In [1037]:
len(cum_rel_freq_percent)
Out[1037]:
13

Getting the Cumulative Frequency of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [288]:
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[37, 44, 48, 48, 49, 49, 49, 49, 49, 49, 53, 54, 55]

Checking the number of elements in the 'freq_cumulative_amount' list.

In [1038]:
len(freq_cumulative_amount)
Out[1038]:
13

Finalizing the Intervals for the Cumulative Relative Frequency Distribution Table.

In [289]:
intervals_cum = [ '< $30 Million','30 to < $60 Million','60 to < $90 Million',
                 '90 to < $120 Million','120 < $150 Million',
 '150 to < $180 Miilion', '180 to < $210 Miilion', '210 to < $240 Miilion',
                 '240 to < $270 Miilion',
 '270 to < $300 Miilion','300 to < $330 Million','330 to < $360 Million','>= $360 Miilion']
print(intervals_cum)#showing the intervals_cum list
['< $30 Million', '30 to < $60 Million', '60 to < $90 Million', '90 to < $120 Million', '120 < $150 Million', '150 to < $180 Miilion', '180 to < $210 Miilion', '210 to < $240 Miilion', '240 to < $270 Miilion', '270 to < $300 Miilion', '300 to < $330 Million', '330 to < $360 Million', '>= $360 Miilion']

Checking the number of elements in the 'intervals_cum' list.

In [1039]:
len(intervals_cum)
Out[1039]:
13

Creating the Cumulative Relative Frequency Distribution Table of all the ROI generated of all the R-rated movies, uding the neccessary virables.

In [290]:
cum_rel_freq1 = pd.DataFrame({"Return On Investment":intervals_cum,
                             "Frequency (f)":freq_amount,
                             "Cumulative Frequency":freq_cumulative_amount,
                             "Cumelative Relative Frequency Percentage":cum_rel_freq_percent,
                            })

The 'cum_rel_freq1' table. (this table is interactive)

In [437]:
cum_rel_freq1
Out[437]:
Return On Investment Frequency (f) Cumulative Frequency Cumelative Relative Frequency Percentage
Loading... (need help?)

Visualizing The Normal Distribution of the all the ROI generated on all of the R-rated movies.

In [439]:
means = '59,600,710'
std = '111,311,472'

def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-350, 350)
    m = [60]
    s = [111]
    c = ['#ff5500']


    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")

   
    plt.xlim(-350, 350)
    plt.ylim(0, .2)
    plt.legend(fontsize=11)
    plt.title('Variability of ROI of R-rated Movies\n Normal Distribution, Mean =16.4 million, StDev=21 million',fontsize=14)
    plt.xlabel("ROI of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.grid(False)
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The Variance of the all the ROI generated on all of the R-rated movies.

In [446]:
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI of R-rated Movies',fontsize=14)
plt.xlabel('Ranking of Values',fontsize=14)
plt.title("The Variance of all the ROI\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title 
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI'].mean(), xmin=0, xmax=55, color='blue') # Mean line
plt.grid(False)
plt.show()# Telling matplotlib to show the chart

Visualizing The Variance using Two Standard Deviation of the all the ROI generated on all of the R-rated movies.

In [447]:
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI of R-rated Movies',fontsize=14)
plt.xlabel('Position of Values',fontsize=14)
plt.title("The Variance of all the ROI\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title 
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI'].mean(), xmin=0, xmax=55, color='blue') # Mean line

for std_int in [-2, -1, 1, 2]: # Going through different stds from the mean
    standard_deviation = df_roi_r['ROI'].mean() + df_roi_r['ROI'].std()*std_int
    
    plt.hlines(y=standard_deviation,
               xmin=0,
               xmax=55,
               linestyles='dashed',
               colors='green'); # 1 std above
    
    # Giving labels to the lines we just drew
    plt.text(y=standard_deviation + 2, x=-10, s=std_int, ha='center')
    plt.grid(False)

Visualizing The Pearson’s Coefficient of Skewness of the all the ROI generated on all of the R-rated movies.

In [448]:
import matplotlib.pyplot as plt

# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=roi, bins='auto', color='#ff5500',
                            alpha=0.7, rwidth=0.85)
plt.grid(False)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('ROI of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
#plt.text( x=np.min(cost), y=0.1, s=r'$\mu=16 million, b=20 million$')
plt.title('The Pearson’s Coefficient of Skewness for the ROI\n of all R-rated movies is 1.14 (n=55)',fontsize=14)
Out[448]:
Text(0.5, 1.0, 'The Pearson’s Coefficient of Skewness for the ROI\n of all R-rated movies is 1.14 (n=55)')

Visualizing The Comparison of Mode, Median and Mean of the all the ROI generated on all of the R-rated movies.

In [449]:
# An "interface" to matplotlib.axes.Axes.hist() method
median_roi = statistics.median(roi)
mean_roi = 59600710
mode_roi = statistics.mode(roi)
n, bins, patches = plt.hist(x=roi, bins='auto', color='#ff5500',
                            alpha=0.2, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
names = ["median", "mean", "mode"]
colors = ['green', 'red', 'blue']
measurements = [median_roi, mean_roi, mode_roi]
for measurement, name, color in zip(measurements, names, colors):
    plt.axvline(x=measurement,  linestyle='--', linewidth=2.5, label='{0} at {1}'.format(name, measurement), c=color)
plt.legend(fontsize=10);
plt.title('Comparison of Mode, Median and Mean \nin the Distribution of the ROI of all the R-rated Drama movies',fontsize=14)
Out[449]:
Text(0.5, 1.0, 'Comparison of Mode, Median and Mean \nin the Distribution of the ROI of all the R-rated Drama movies')

Visualizing The Chebyshevs Theorem of the all the ROI generated on all of the R-rated movies.

In [450]:
means = '59,600,710'
std = '111,311,472'
means1 = 60
std1 = 111
def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():

    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-400, 400)
    s = [111]
    m = [60]
    c = ['#ff5500']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
    
    x = np.linspace(means1 - std1*2, means1 + std1*2)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x, y, alpha=0.5, color='#ff5500')
    ax.annotate('at least 75%\n at least 41 obs', xy=(250,0.0035), xytext=(250,0.0020),
            arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
    
   
    plt.xlim(-400, 400)
    plt.ylim(0, .004)
    plt.legend(fontsize=10)
    plt.title('Chebyshevs Theorem on the ROI \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
    plt.xlabel("ROI of R-rated Movies",fontsize=14)
    plt.ylabel("Density", fontsize=14)
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The Chebyshevs Theorem of the all the ROI generated on all of the R-rated movies.

In [451]:
means = '59,600,710'
std = '111,311,472'
means1 = 60
std1 = 111
def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-400, 400)
    s = [111]
    m = [60]
    c = ['#ff5500']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
    
    x = np.linspace(280, 350)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x, y, alpha=0.5, color='#ff5500')
    
    x1 = np.linspace(-170, -250)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x1, y, alpha=0.5, color='#ff5500')

    ax.annotate('at least 13.9%\n at leat 8 obs',xy=(250,0.0035), xytext=(250,0.0020),
            arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
   
    plt.xlim(-400, 400)
    plt.ylim(0, .004)
    plt.legend(fontsize=10)
    plt.title('Chebyshevs Theorem on the ROI \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
    plt.xlabel("ROI of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The KDE and Jittered plot of the all the ROI generated on all of the R-rated movies.

In [452]:
import seaborn as sns
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r,inner=None,color='0.8').set(title='KDE and Jittered strip plot\n on the ROI of the r-rated movies')
plt.show()

Visualizing The KDE and Swarm plot of the all the ROI generated on all of the R-rated movies.

In [453]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.swarmplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r, color='0.8', inner=None, aplha=.2).set(title='KDE and swarm plot\n on the ROI of the r-rated movies')
#sns.despine()
plt.show()

Visualizing The KDE and Rug plot of the all the ROI generated on all of the R-rated movies.

In [454]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500', jitter=False)
sns.violinplot(data=df_roi_r,  split=True,inner=None,
      scale="count", color='0.8', alpha=.1).set(title='KDE and rug plot\n on the ROI of the r-rated movies')
#sns.despine()
plt.show()

Styling the first portion of the Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.

In [296]:
freq_dis_roi1 = freq_dis_roi[:12].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\1533880768.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_dis_roi1 = freq_dis_roi[:12].style.hide_index()\

Saving the freq_dis_roi1 dataframe to the freq_dis_roi1.png file as an image to be used for the analysis later on.

In [297]:
dfi.export(freq_dis_roi1, 'freq_dis_roi1.png')

The 'freq_dis_roi1' datarame.

Styling the second portion of the Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.

In [298]:
freq_dis_roi2 = freq_dis_roi[12:24].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\2751053211.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_dis_roi2 = freq_dis_roi[12:24].style.hide_index()\

Saving the freq_dis_roi2 dataframe to the freq_dis_roi2.png file as an image to be used for the analysis later on.

In [302]:
dfi.export(freq_dis_roi2, 'freq_dis_roi2.png')

The 'freq_dis_roi2' datarame.

Styling the last portion of the Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.

In [300]:
freq_dis_roi3 = freq_dis_roi[24:].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\3066803151.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_dis_roi3 = freq_dis_roi[24:].style.hide_index()\

Saving the freq_dis_roi3 dataframe to the freq_dis_roi3.png file as an image to be used for the analysis later on.

In [301]:
dfi.export(freq_dis_roi3, 'freq_dis_roi3.png')

The 'freq_dis_roi3' datarame.

Styling the Cumulative Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.

In [1046]:
freq_cum_dis11 = freq_cum_dis1.style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black"),
                                      ("font-size", "10pt")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\3430443440.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_cum_dis11 = freq_cum_dis1.style.hide_index()\

Saving the freq_cum_dis11 dataframe to the freq_cum_dis11.png file as an image to be used for the analysis later on.

In [1047]:
dfi.export(freq_cum_dis11, 'freq_cum_dis11.png')

The 'freq_cum_dis11' datarame.

Styling the Cumelative Relative Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.

In [308]:
cum_rel_freq11 = cum_rel_freq1.style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black"),
                                      ("font-size", "10pt")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\906232283.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  cum_rel_freq11 = cum_rel_freq1.style.hide_index()\

Saving the cum_rel_freq1 dataframe to the cum_rel_freq1.png file as an image to be used for the analysis later on.

In [310]:
dfi.export(cum_rel_freq11, 'cum_rel_freq11.png')

The 'cum_rel_freq1' datarame.

Cumelative Relative Frequency Distribution Line Plot of the all the ROI generated on all of the R-rated movies.

In [460]:
amount = [30000000, 60000000, 90000000, 120000000, 150000000, 180000000, 210000000, 240000000, 
       270000000, 300000000, 330000000, 360000000, 531000000]
freq = [67, 80, 87, 87, 89, 89, 89, 89, 89, 89, 96, 98, 100]
  
plt.plot( amount, freq ,color='red', marker='o')
plt.title('Cumulative relative frequency (%) of \n the ROI made on R-rated movies', fontsize=14)
plt.xlabel('Return On Investment', fontsize=14)
plt.ylabel('Cumulative relative frequency (%)', fontsize=14)
plt.grid(True)
plt.show()

Getting the ROI Percentage on all of the R-rated movies.

In [318]:
roi_per = []
for i in system_rating_r["ROI Percentage"]:
    i = int(i.replace('%', ''))
    roi_per.append(i)
print(roi_per) #showing the roi_per list
[350, 504, 40, 593, 575, 36, 156, 1327, 35, 418, 238, 44, 38, 81, 133, 41, 2448, 195, 179, 65, 199, 263, 411, 601, 132, 815, 54, 250, 258, 5, 1332, 1056, 501, 1081, 675, 731, 707, 465, 408, 4, 216, 970, 850, 1557, 444, 16, 218, 2670, 813, 808, 74, 1852, 21, 13, 22]

Checking the number of elements in the 'roi_per' list.

In [319]:
len(roi_per)
Out[319]:
55

Putting the roi_per of all the R-rated movies into a dtaframe called df_roi_r.

In [320]:
df_roi_per_r = pd.DataFrame({"ROI Percentage":roi_per})

The 'df_roi_r' dataframe. (this dataframe is interactive)

In [438]:
df_roi_per_r
Out[438]:
ROI Percentage
Loading... (need help?)

Getting the Arithmetic Mean of the all the ROI Percentage of all of the R-rated movies.

In [463]:
x = statistics.mean(roi_per)
print("Arithmetic Mean of the ROI for the R-rated movies is:", x)
Arithmetic Mean of the ROI for the R-rated movies is: 508.8727272727273

Getting the Median of the all the ROI Percentage of all of the R-rated movies.

In [464]:
print("Median of the ROI for the R-rated movies is:", statistics.median(roi_per))
Median of the ROI for the R-rated movies is: 263

Getting the Standard Deviation of the all the ROI Percentage of all of the R-rated movies.

In [465]:
print("Standard deviation of the  ROI Percentage for the R-rated movies is:", np.std(roi_per, ddof=1))
Standard deviation of the  ROI Percentage for the R-rated movies is: 590.4479797873956

Getting the Coefficient of Variation of the all the ROI Percentage of all of the R-rated movies.

In [466]:
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
print("Coefficient of Variation of the  ROI Percentage for the R-rated movies is:", cv(roi_per))
Coefficient of Variation of the  ROI Percentage for the R-rated movies is: 116.03058056419451

Getting the Pearson’s Coefficient of Skewness of the all the ROI Percentage of all of the R-rated movies.

In [467]:
def pearsons(mean, median, standard_deviation):
    skewness = (mean-median)*3/standard_deviation
    return skewness
print("Pearson’s Coefficient of Skewness of the ROI Percentage for the R-rated movies is:", 
      pearsons( statistics.mean(roi_per),statistics.median(roi_per),np.std(roi_per, ddof=1)))   
Pearson’s Coefficient of Skewness of the ROI Percentage for the R-rated movies is: 1.2492517665718463

Getting the Chebyshevs Theroem of the all the ROI Percentage on all of the R-rated movies.

In [468]:
def chebyshevs(mean, standard_deviation, num_std, previous_p):
    position_std = num_std*standard_deviation
    upper_range = mean - position_std 
    if upper_range < 0: upper_range = 0
    lower_range = position_std + mean
    if num_std == 2: 
        print('At least 75% of the ROI Percentage of the r-rated movies ranges from',upper_range,'to',lower_range)
    if num_std == 3: 
        print('At least 13.9% of the ROI Percentage of the r-rated movies ranges from',previous_p,'to',lower_range)
chebyshevs(510, 590, 2, 0)
chebyshevs(510, 590, 3, 1690)
At least 75% of the ROI Percentage of the r-rated movies ranges from 0 to 1690
At least 13.9% of the ROI Percentage of the r-rated movies ranges from 1690 to 2280

Getting the Kurtosis of the all the ROI Percentage on all of the R-rated movies.

In [469]:
print('Kurtosis of the ROI Percentage of the r-rated movies is:',kurtosis(roi_per, fisher=False))
print('Excess Kurtosis of the ROI Percentage of the r-rated movies is:',
      (kurtosis(roi_per,fisher=False)-3))#leptokurtic
Kurtosis of the ROI Percentage of the r-rated movies is: 6.600168491255548
Excess Kurtosis of the ROI Percentage of the r-rated movies is: 3.6001684912555483

Getting the Arithmetic Mean and the Trimmed Mean of the all the ROI Percentage on all of the R-rated movies.

In [470]:
print("Arithmetic Mean of the ROI for the R-rated movies is:", statistics.mean(roi_per))
print('10% Trimmed mean of the ROI of the r-rated movies is:',stats.trim_mean(roi_per, 0.10))
Arithmetic Mean of the ROI for the R-rated movies is: 508.8727272727273
10% Trimmed mean of the ROI of the r-rated movies is: 401.55555555555554

Rounding the ROI Percentage of R-rated movies to the nearest million and storing it in a list called 'freq_demo'.

In [322]:
freq_demo = [5, 5, 10, 20, 20, 20, 40, 40, 40, 40, 40, 40, 50, 70, 70, 80, 130, 130,
            150, 180, 200, 220, 240, 300, 300, 300, 400, 400, 400, 400, 500, 500, 500, 
            500, 600, 600, 600, 700, 700, 700, 800, 800, 800, 900, 1000, 1100, 1300, 
            1300, 2000, 2000, 2500, 3000]
print(freq_demo) #showing the freq_demo list
[5, 5, 10, 20, 20, 20, 40, 40, 40, 40, 40, 40, 50, 70, 70, 80, 130, 130, 150, 180, 200, 220, 240, 300, 300, 300, 400, 400, 400, 400, 500, 500, 500, 500, 600, 600, 600, 700, 700, 700, 800, 800, 800, 900, 1000, 1100, 1300, 1300, 2000, 2000, 2500, 3000]

Checking the number of elements in the 'freq_demo' list.

In [936]:
len(freq_demo)
Out[936]:
52

Getting the Frequency of the Repeated Values of all the ROI Percentage of the R-rated Drama movies. Which will be stored in a dictionary called 'freq_demo1'.

In [323]:
freq_demo1 = Counter((freq_demo))
print(freq_demo1)#showing the freq_demo1 list
Counter({40: 6, 400: 4, 500: 4, 20: 3, 300: 3, 600: 3, 700: 3, 800: 3, 5: 2, 70: 2, 130: 2, 1300: 2, 2000: 2, 10: 1, 50: 1, 80: 1, 150: 1, 180: 1, 200: 1, 220: 1, 240: 1, 900: 1, 1000: 1, 1100: 1, 2500: 1, 3000: 1})

Sorting the 'freq_demo1' dictionary in accending order.

In [324]:
freq_one = sorted(freq_demo1.items(), key=lambda i: i[0])
print(freq_one)#showing the freq_one list
[(5, 2), (10, 1), (20, 3), (40, 6), (50, 1), (70, 2), (80, 1), (130, 2), (150, 1), (180, 1), (200, 1), (220, 1), (240, 1), (300, 3), (400, 4), (500, 4), (600, 3), (700, 3), (800, 3), (900, 1), (1000, 1), (1100, 1), (1300, 2), (2000, 2), (2500, 1), (3000, 1)]

Creating a list called 'freq_one' with the ROI Percentage of the R-rated Dram movies in 'freq_one' list.

In [325]:
roi_per_freq = []
for i in freq_one: 
    roi_per_freq.append("{:}%".format(i[0]))
print(roi_per_freq)#showing the roi_per_freq list
['5%', '10%', '20%', '40%', '50%', '70%', '80%', '130%', '150%', '180%', '200%', '220%', '240%', '300%', '400%', '500%', '600%', '700%', '800%', '900%', '1000%', '1100%', '1300%', '2000%', '2500%', '3000%']

Checking the number of elements in the 'roi_per_freq' list.

In [956]:
len(roi_per_freq)
Out[956]:
26

Creating a list called 'roi_per_freq_amount' with the frequency of the values from 'freq_one' list.

In [326]:
roi_per_freq_amount = []
for i in freq_one: 
    roi_per_freq_amount.append(i[1])
print(roi_per_freq_amount)#showing the roi_per_freq_amount list
[2, 1, 3, 6, 1, 2, 1, 2, 1, 1, 1, 1, 1, 3, 4, 4, 3, 3, 3, 1, 1, 1, 2, 2, 1, 1]

Checking the number of elements in the 'roi_per_freq_amount' list.

In [958]:
len(roi_per_freq_amount)
Out[958]:
26

Creating a Frequency Distribution Table called 'freq_dis', of all the ROI Percentage on all of the R-rated movies.

In [327]:
freq_dis3 = pd.DataFrame({"ROI\nPercentage (x)":roi_per_freq,
                                 "Frequency (f)":roi_per_freq_amount})

The 'freq_dis' dataframe. (this dataframe is interactive)

In [439]:
freq_dis3
Out[439]:
ROI Percentage (x) Frequency (f)
Loading... (need help?)

Getting the Upper Values and Lower Values of all the ROI Percentage on all of the R-rated movies, for the Cumulative Frequency Distribution Table.

In [963]:
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
        
#a =list(chunks(range(0, 2700), 150))
a =list(chunks(range(0, 2800), 200))
a#showing the a list
Out[963]:
[range(0, 200),
 range(200, 400),
 range(400, 600),
 range(600, 800),
 range(800, 1000),
 range(1000, 1200),
 range(1200, 1400),
 range(1400, 1600),
 range(1600, 1800),
 range(1800, 2000),
 range(2000, 2200),
 range(2200, 2400),
 range(2400, 2600),
 range(2600, 2800)]

Finalizing the Lower Values for the Cumulative Frequency Distribution Table.

In [329]:
lower_val = ['0%', '151%', '302%', '453%', '604%', '755%', '906%', '1057%', '1208%', '1359%', 
            '1510%', '1661%', '1812%', '1963%', '2114%', '2265%', '2416%', '2567%' ]
print(lower_val)#showing the lower_val list
['0%', '151%', '302%', '453%', '604%', '755%', '906%', '1057%', '1208%', '1359%', '1510%', '1661%', '1812%', '1963%', '2114%', '2265%', '2416%', '2567%']

Checking the number of elements in the 'lower_val' list.

In [330]:
len(lower_val)
Out[330]:
18

Finalizing the Upper Values for the Cumulative Frequency Distribution Table.

In [331]:
upper_val = ['150%', '301%', '452%', '603%', '754%', '905%', '1056%', '1207%','1358%', '1509%',
            '1660%', '1811%', '1962%', '2113%', '2264%', '2415%', '2566%', '2717%']
print(upper_val)#showing the upper_val list
['150%', '301%', '452%', '603%', '754%', '905%', '1056%', '1207%', '1358%', '1509%', '1660%', '1811%', '1962%', '2113%', '2264%', '2415%', '2566%', '2717%']

Checking the number of elements in the 'upper_val' list.

In [967]:
len(upper_val)
Out[967]:
18

Getting the Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [332]:
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
count14 = 0
count15 = 0
count16 = 0
count17 = 0
count18 = 0
for i in roi_per: 
    if 0 <= i <= 150:
        count1+=1
    if 151 <= i <= 301:
        count2+=1
    if 302 <= i <= 452:
        count3+=1
    if 453 <= i <= 603:
        count4+=1
    if 604 <= i <= 754:
        count5+=1
    if 755 <= i <= 905:
        count6+=1
    if 906 <= i <= 1056:
        count7+=1
    if 1057 <= i <= 1207:
        count8+=1
    if 1208 <= i <= 1358:
        count9+=1
    if 1359 <= i <= 1509:
        count10+=1
    if 1510 <= i <= 1660:
        count11+=1
    if 1661 <= i <= 1811:
        count12+=1
    if 1812 <= i <= 1962:
        count13+=1
    if 1963 <= i <= 2113:
        count14+=1
    if 2114 <= i <= 2264:
        count15+=1
    if 2265 <= i <= 2415:
        count16+=1
    if 2416 <= i <= 2566:
        count17+=1
    if 2567 <= i <= 2717:
        count18+=1

freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10,
              count11,count12,count13,count14,count15,count16,count17,count18]
print(freq_amount)#showing the freq_amount list
[18, 10, 5, 6, 3, 4, 2, 1, 2, 0, 1, 0, 1, 0, 0, 0, 1, 1]

Checking the number of elements in the 'freq_amount' list.

In [1000]:
len(freq_amount)
Out[1000]:
18

Getting the Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [333]:
freq_amount_percent_demo = [count1/55*100,count2/55*100,count3/55*100,count4/55*100,
                       count5/55*100,count6/55*100,count7/55*100,count8/55*100,
                       count9/55*100,count10/55*100,count11/55*100,count12/55*100,
                       count13/55*100,count14/55*100,count15/55*100,count16/55*100,
                       count17/55*100,count18/55*100,]
freq_amount_percent_demo1 = [33, 18, 9, 11, 5, 7, 3, 2, 4, 0, 2, 0, 2, 0, 0, 0, 2, 2]
print(freq_amount_percent_demo1)#showing the freq_amount_percent_demo1 list
[33, 18, 9, 11, 5, 7, 3, 2, 4, 0, 2, 0, 2, 0, 0, 0, 2, 2]

Checking the number of elements in the 'freq_amount_percent_demo1' list.

In [1001]:
len(freq_amount_percent_demo1)
Out[1001]:
18

Turning the integer in the freq_amount_percent_demo1 list into a string with the percentage symbol.

In [334]:
freq_amount_percent = []
for i in freq_amount_percent_demo1:
    freq_amount_percent.append("{:}%".format(i))
print(freq_amount_percent)#showing the freq_amount_percent list
['33%', '18%', '9%', '11%', '5%', '7%', '3%', '2%', '4%', '0%', '2%', '0%', '2%', '0%', '0%', '0%', '2%', '2%']

Checking the number of elements in the 'freq_amount_percent' list.

In [1002]:
len(freq_amount_percent)
Out[1002]:
18

Getting the Cumulative Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [335]:
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[18, 28, 33, 39, 42, 46, 48, 49, 51, 51, 52, 52, 53, 53, 53, 53, 54, 55]

Checking the number of elements in the 'freq_cumulative_amount' list.

In [1003]:
len(freq_cumulative_amount)
Out[1003]:
18

Getting the Cumulative Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.

In [336]:
freq_cumulative_percent_demo = Cumulative(freq_amount_percent_demo1)
print(freq_cumulative_percent_demo)#showing the freq_cumulative_percent_demo list
[33, 51, 60, 71, 76, 83, 86, 88, 92, 92, 94, 94, 96, 96, 96, 96, 98, 100]

Checking the number of elements in the 'freq_cumulative_percent_demo' list.

In [1004]:
len(freq_cumulative_percent_demo)
Out[1004]:
18

Turning the integer in the freq_cumulative_percent_demo list into a string with the percentage symbol.

In [337]:
freq_cumulative_percent = []
for i in freq_cumulative_percent_demo:
    freq_cumulative_percent.append("{:}%".format(i))
print(freq_cumulative_percent)#showing the freq_cumulative_percent list
['33%', '51%', '60%', '71%', '76%', '83%', '86%', '88%', '92%', '92%', '94%', '94%', '96%', '96%', '96%', '96%', '98%', '100%']

Checking the number of elements in the 'freq_cumulative_percent' list.

In [1005]:
len(freq_cumulative_percent)
Out[1005]:
18

Creating the Cumulative Frequency Distribution Table of all the ROI Percentage of all the R-rated movies, uding the neccessary virables.

In [338]:
freq_cum_dis2 = pd.DataFrame({"Lower\nValue":lower_val,
                             "Upper\nValue":upper_val,
                             "Frequency (f)":freq_amount,
                             "Percentage (%)":freq_amount_percent,
                            "Cumulative\nFrequency":freq_cumulative_amount,
                            "Cumulative\nPercentage":freq_cumulative_percent})

The 'freq_cum_dis2' table. (this table is interactive)

In [440]:
freq_cum_dis2
Out[440]:
Lower Value Upper Value Frequency (f) Percentage (%) Cumulative Frequency Cumulative Percentage
Loading... (need help?)

Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.

In [340]:
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
count14 = 0
for i in roi_per: 
    if i < 200:
        count1+=1
    if 200 <= i < 400:
        count2+=1
    if 400 <= i < 600:
        count3+=1
    if 600 <= i < 800:
        count4+=1
    if 800 <= i < 1000:
        count5+=1
    if 1000 <= i < 1200:
        count6+=1
    if 1200 <= i < 1400:
        count7+=1
    if 1400 <= i < 1600:
        count8+=1
    if 1600 <= i < 1800:
        count9+=1
    if 1800 <= i <= 2000:
        count10+=1
    if 2000 <= i < 2200:
        count11+=1
    if 2200 <= i < 2400:
        count12+=1
    if 2400 <= i < 2600:
        count13+=1
    if 2600 <= i <= 2800:
        count14+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10,
              count11,count12,count13,count14]
print(freq_amount)#showing the freq_amount list
[22, 7, 9, 4, 5, 2, 2, 1, 0, 1, 0, 0, 1, 1]

Checking the number of elements in the 'freq_amount' list.

In [1015]:
len(freq_amount)
Out[1015]:
14

Getting the Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [341]:
cum_rel_freq_demo = []
for i in freq_amount:cum_rel_freq_demo.append(i/55*100)
cum_rel_freq_demo1 = [40,13,16,7,9,4,3,2,0,2,0,0,2,2]
print(cum_rel_freq_demo1)#showing the cum_rel_freq_demo1 list
[40, 13, 16, 7, 9, 4, 3, 2, 0, 2, 0, 0, 2, 2]

Checking the number of elements in the 'cum_rel_freq_demo1' list.

In [1016]:
len(cum_rel_freq_demo1)
Out[1016]:
14

Getting the Cumulative Relative Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.

In [342]:
cum_rel_freq_demo2 = Cumulative(cum_rel_freq_demo1)
print(cum_rel_freq_demo2)#showing the cum_rel_freq_demo2 list
[40, 53, 69, 76, 85, 89, 92, 94, 94, 96, 96, 96, 98, 100]

Checking the number of elements in the 'cum_rel_freq_demo2' list.

In [1017]:
len(cum_rel_freq_demo2)
Out[1017]:
14

Turning the integer in the cum_rel_freq_demo2 list into a string with the percentage symbol.

In [343]:
cum_rel_freq_percent = []
for i in cum_rel_freq_demo2:
    cum_rel_freq_percent.append("{:}%".format(i))
print(cum_rel_freq_percent)#showing the cum_rel_freq_percent list
['40%', '53%', '69%', '76%', '85%', '89%', '92%', '94%', '94%', '96%', '96%', '96%', '98%', '100%']

Checking the number of elements in the 'cum_rel_freq_percent' list.

In [1018]:
len(cum_rel_freq_percent)
Out[1018]:
14

Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.

In [344]:
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[22, 29, 38, 42, 47, 49, 51, 52, 52, 53, 53, 53, 54, 55]

Checking the number of elements in the 'freq_cumulative_amount' list.

In [1019]:
len(freq_cumulative_amount)
Out[1019]:
14

Finalizing the Intervals for the Cumulative Relative Frequency Distribution Table.

In [345]:
intervals_cum = [ '< 200%','200% to < 400%','400% to < 600%','600% to < 800%','800% < 1000%',
 '1000% to < 1200%', '1200% to < 1400%', '1400% to < 1600%','1600% to < 1800%',
 '1800% to < 2000%','2000% to < 2200%','2200% to < 2400%','2400% to < 2600%', '>= 2800%']
print(intervals_cum)#showing the intervals_cum list
['< 200%', '200% to < 400%', '400% to < 600%', '600% to < 800%', '800% < 1000%', '1000% to < 1200%', '1200% to < 1400%', '1400% to < 1600%', '1600% to < 1800%', '1800% to < 2000%', '2000% to < 2200%', '2200% to < 2400%', '2400% to < 2600%', '>= 2800%']

Checking the number of elements in the 'intervals_cum' list.

In [1020]:
len(intervals_cum)
Out[1020]:
14

Creating the Cumulative Relative Frequency Distribution Table of all the ROI Percentage on all the R-rated movies, uding the neccessary virables.

In [346]:
cum_rel_freq2 = pd.DataFrame({"Return On Investment":intervals_cum,
                             "Frequency (f)":freq_amount,
                             "Cumulative Frequency":freq_cumulative_amount,
                             "Cumelative Relative Frequency Percentage":cum_rel_freq_percent,
                            })

The 'cum_rel_freq2' table. (this table is interactive)

In [441]:
cum_rel_freq2
Out[441]:
Return On Investment Frequency (f) Cumulative Frequency Cumelative Relative Frequency Percentage
Loading... (need help?)

Visualizing The Normal Distribution of the all the ROI Percentage on all of the R-rated movies.

In [592]:
means = '510%'
std = '590%'

def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-40, 40)
    s = [5.9]
    m = [5.1]
    c = ['#ff5500']


    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")

   
    plt.xlim(-40, 40)
    plt.ylim(0, .2)
    plt.legend(fontsize=11)
    plt.title('Variability of ROI Percentage of R-rated Movies\n Normal Distribution, Mean = 510%, StDev=590%',fontsize=14)
    plt.xlabel("ROI Percentage of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.grid(False)
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The Variance of the all the ROI Percentage on all of the R-rated movies.

In [593]:
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI Percentage of R-rated Movies',fontsize=14)
plt.xlabel('Ranking of Values',fontsize=14)
plt.title("The Variance of all the ROI Percentage\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title 
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI Percentage'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI Percentage'].mean(), xmin=0, xmax=55, color='blue') # Mean line
plt.grid(False)
plt.show()# Telling matplotlib to show the chart

Visualizing The Variance using Two Standard Deviation of the all the ROI Percentage on all of the R-rated movies.

In [594]:
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI Perecntage of R-rated Movies',fontsize=14)
plt.xlabel('Position of Values',fontsize=14)
plt.title("The Variance of all the ROI Percentage\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title 
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI Percentage'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI Percentage'].mean(), xmin=0, xmax=55, color='blue') # Mean line

for std_int in [-2, -1, 1, 2]: # Going through different stds from the mean
    standard_deviation = df_roi_r['ROI Percentage'].mean() + df_roi_r['ROI Percentage'].std()*std_int
    
    plt.hlines(y=standard_deviation,
               xmin=0,
               xmax=55,
               linestyles='dashed',
               colors='green'); # 1 std above
    
    # Giving labels to the lines we just drew
    plt.text(y=standard_deviation + 2, x=-10, s=std_int, ha='center')
    plt.grid(False)

Visualizing The Pearson’s Coefficient of Skewness of the all the ROI Percentage on all of the R-rated movies.

In [595]:
import matplotlib.pyplot as plt

# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=roi_per, bins='auto', color='#ff5500',
                            alpha=0.7, rwidth=0.85)
plt.grid(False)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('ROI Percentage of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
#plt.text( x=np.min(cost), y=0.1, s=r'$\mu=16 million, b=20 million$')
plt.title('The Pearson’s Coefficient of Skewness for the ROI Percentage\n of all R-rated movies is 1.24 (n=55)',fontsize=14)
Out[595]:
Text(0.5, 1.0, 'The Pearson’s Coefficient of Skewness for the ROI Percentage\n of all R-rated movies is 1.24 (n=55)')

Visualizing The Comparison of Mode, Median and Mean of the all the ROI Percentage on all of the R-rated movies.

In [493]:
# An "interface" to matplotlib.axes.Axes.hist() method
median_roi_per = statistics.median(roi_per)
mean_roi_per = 510
mode_roi_per = statistics.mode(roi_per)
n, bins, patches = plt.hist(x=roi_per, bins='auto', color='#ff5500',
                            alpha=0.2, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
names = ["median", "mean", "mode"]
colors = ['green', 'red', 'blue']
measurements = [median_roi_per, mean_roi_per, mode_roi_per]
for measurement, name, color in zip(measurements, names, colors):
    plt.axvline(x=measurement,  linestyle='--', linewidth=2.5, label='{0} at {1}'.format(name, measurement), c=color)
plt.legend(fontsize=10);
plt.title('Comparison of Mode, Median and Mean in the Distribution\n of the ROI Percentage of all the R-rated Drama movies',fontsize=14)
Out[493]:
Text(0.5, 1.0, 'Comparison of Mode, Median and Mean in the Distribution\n of the ROI Percentage of all the R-rated Drama movies')

Visualizing The Chebyshevs Theorem of the all the ROI Percentage on all of the R-rated movies.

In [494]:
means = '510%'
std = '590%'
means1 = 5.1
std1 = 5.9
def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():

    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-50, 50)
    s = [5.9]
    m = [5.1]
    c = ['#ff5500']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
    
    x = np.linspace(means1 - std1*2, means1 + std1*2)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x, y, alpha=0.5, color='#ff5500')
    ax.annotate('at least 75%\n at least 41 obs', xy=(250,0.0035), xytext=(250,0.0020),
            arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
    
   
    plt.xlim(-50, 50)
    plt.ylim(0, .07)
    plt.legend(fontsize=10)
    plt.title('Chebyshevs Theorem on the ROI Percentage\nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
    plt.xlabel("ROI Percentage of R-rated Movies",fontsize=14)
    plt.ylabel("Density", fontsize=14)
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The Chebyshevs Theorem of the all the ROI Percentage on all of the R-rated movies.

In [495]:
means = '510%'
std = '590%'
means1 = 5.1
std1 = 5.9
def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-40, 40)
    s = [5.9]
    m = [5.1]
    c = ['#ff5500']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
    
    x = np.linspace(17, 20)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x, y, alpha=0.5, color='#ff5500')
    
    x1 = np.linspace(-7, -10)
    y = norm.pdf(x, means1, std1)
    ax.fill_between(x1, y, alpha=0.5, color='#ff5500')

    ax.annotate('at least 13.9%\n at leat 8 obs',xy=(250,0.0035), xytext=(250,0.0020),
            arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
   
    plt.xlim(-40, 40)
    plt.ylim(0, .07)
    plt.legend(fontsize=10)
    plt.title('Chebyshevs Theorem on the ROI \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
    plt.xlabel("ROI of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.show()

if __name__ == '__main__':
   main()

Visualizing The KDE and Jittered plot of the all the ROI Percentage on all of the R-rated movies.

In [496]:
import seaborn as sns
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r,inner=None,color='0.8').set(title='KDE and Jittered strip plot\n on the ROI Percentage of the r-rated movies')
plt.show()

Visualizing The KDE and Swarm plot of the all the ROI Percentage on all of the R-rated movies.

In [497]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.swarmplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r, color='0.8', inner=None, aplha=.2).set(title='KDE and swarm plot\n on the ROI Percentage of the r-rated movies')
#sns.despine()
plt.show()

Visualizing The KDE and Rug plot of the all the ROI Percentage on all of the R-rated movies.

In [498]:
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500', jitter=False)
sns.violinplot(data=df_roi_r,  split=True,inner=None,
      scale="count", color='0.8', alpha=.1).set(title='KDE and rug plot\n on the ROI Percentage of the r-rated movies')
#sns.despine()
plt.show()

Styling the first portion of the Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.

In [1050]:
freq_dis_per = freq_dis[:12].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1667078573.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_dis_per = freq_dis[:12].style.hide_index()\

Saving the freq_dis_per dataframe to the freq_dis_per.png file as an image to be used for the analysis later on.

In [1051]:
dfi.export(freq_dis_per, 'freq_dis_per.png')

The 'freq_dis_per' datarame.

Styling the second portion of the Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.

In [1052]:
freq_dis_per1 = freq_dis[12:].style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\54156960.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_dis_per1 = freq_dis[12:].style.hide_index()\

Saving the freq_dis_per1 dataframe to the freq_dis_per1.png file as an image to be used for the analysis later on.

In [1053]:
dfi.export(freq_dis_per1, 'freq_dis_per1.png')

The 'freq_dis_per1' datarame.

Styling the Cumulative Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.

In [1054]:
freq_cum_dis_per2 = freq_cum_dis2.style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black"),
                                      ("font-size", "10pt")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\2639483089.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  freq_cum_dis_per2 = freq_cum_dis2.style.hide_index()\

Saving the freq_cum_dis_per2 dataframe to the freq_cum_dis_per2.png file as an image to be used for the analysis later on.

In [1055]:
dfi.export(freq_cum_dis_per2, 'freq_cum_dis_per2.png')

The 'freq_cum_dis_per2' datarame.

Styling the Cumelative Relative Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.

In [1056]:
cum_rel_freq_per2 = cum_rel_freq2.style.hide_index()\
            .set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
            {"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
                                         ("font-size" , "12pt")]},#headinig
            {'selector':"td", "props":[("background-color","white"), ("color"," black"),
                                      ("font-size", "10pt")]},#inside chart
            {'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
            {'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},                                                                     ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\572617805.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
  cum_rel_freq_per2 = cum_rel_freq2.style.hide_index()\

Saving the cum_rel_freq_per2 dataframe to the cum_rel_freq_per2.png file as an image to be used for the analysis later on.

In [1057]:
dfi.export(cum_rel_freq_per2, 'cum_rel_freq_per2.png')

The 'cum_rel_freq_per2' datarame.

Cumelative Relative Frequency Distribution Line Plot of the all the ROI Percentage on all of the R-rated movies.

In [503]:
amount = [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]
freq = [40, 53, 69, 76, 85, 89, 92, 94, 94, 96, 96, 96, 98, 100]
plt.plot( amount, freq ,color='red', marker='o')
plt.title('Cumulative relative frequency (%) of \n the ROI Percentage made on R-rated movies', fontsize=14)
plt.xlabel('Amount of ROI', fontsize=14)
plt.ylabel('Cumulative relative frequency (%)', fontsize=14)
plt.grid(True)
plt.show()
In [504]:
#                    Normal Distribution
import numpy as np
import matplotlib.pyplot as plt 

def make_gauss(N, sig, mu):
    return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))

def main():
    ax = plt.figure().add_subplot(1,1,1)
    x = np.arange(-300, 300)
    s = [21,111,5.9]
    m = [16,60,5.1] 
    c = ['b','r','g']

    for sig, mu, color in zip(s, m, c): 
        gauss = make_gauss(1, sig, mu)(x)
        ax.plot(x, gauss, color, linewidth=2)

    plt.xlim(-300, 300)
    plt.ylim(0, 0.07)
    plt.title('Variability of the Cost, ROI, ROI Percentage of R-rated Movies\n Normal Distribution',fontsize=14)
    plt.xlabel("Values of the Cost, ROI, ROI Percent of R-rated Movies",fontsize=14)
    plt.ylabel("Density",fontsize=14)
    plt.legend(['Cost', 'ROI', 'RIO%'], loc='best')
    plt.grid(False)
    plt.show()

if __name__ == '__main__':
   main()

1. Conclusion: Return On Investement

The Distribution of the Budgets of
all the R-rated movies in the Drama genre.

X-axis: Cost of R-rated Movies
Y-axis: Density

The Variance of the Budgets of all the
R-rated movies in the Drama genre.

X-axis: Ranking of Values
Y-axis: Cost of R-rated Movies

The Variance with Standard Deviations of the
Budgets of all the R-rated movies in the Drama genre.

X-axis: Position of Values
Y-axis: Cost of R-rated Movies

The Arithmetic Mean of the Budgets of all the R-rated Drama movies produced in this dataset is $16,450,866 and the Standard Deviation is 20,757,148, with 55 R-rated Drama movies that were used for this analysis. The Arithmetic Mean is the sum of all observations in the given data set divided by the total, in orther words it is the finidng of the central value in a sample data set in statistics. The Standard Deviation measures the dispersion of a dataset relative to its mean, it is also used as a measurement of riskness of a deccision. The larger the Standard Deviation the more it indicates that there ia a lot of spread within the data around the mean, depending on the situation this can mean there is a high risk. The graph above is the Distribution of all the Budgets of the R-rated Drama movies, looking at the graph the distribution is shifted to the right this indicates that the mean is a large number and the Distribution is more strechted out than the Normal Distribution this indcates that the standard deviation value is large. The fact that the distribution is shifted to the right, the distribution it being strechetd out making it alot wider than a normal distributiont, the standard deviation is greater than the arithmetic mean this all indicates High Variation between the values and a abnormal distribution. This all means that the data pionts which are the budgets of all the R-rated Drama movies are spread out from the arithmetic mean. What does this imply? This means the budgeting needed to produce R-rated Drama movies are inconsistant, this means it could be extremely expensive or it could be extremely inexpensive compared to the arithmetic mean (which is 16,450,866) to produce R-rated Drama movies. However were does the majority of the dataset stand on, is it on the expensive side or the inexpensive side of the spectrum?

The Variance of the Budgets of all the R-rated Drama movies is 430,859,201,847,042, with 55 R-rated Drama movies that were used for this analysis. The Variance is a measure of dispersion that measures the spread of all data pionts in a data set. It tells you the degree of spread of the data piont from one antther. The larger the varince the more spread out the data pionts are from one another in relation to the mean. The Variance is really big as it is in the trillions, this shows that the data pionts are really far from another or it can also indicate that they may be many outliers in the dataset. This aslo indcated that beacause the variance is high and when the value of the varince is large it indicates alot of spread within the data pionts, becaue the data pionts are spread out this indicates that the budget of R-rated Drama movies is inconsistant, it could be very expensive to produce a R-rated Drama movie or is could be on the inexpensive side of the specturm. The graph above visualises what the varience will look like with the blue line as the Arithmetic Mean of 16,450,868 which is the average Budgets of all the R-rated Drama movies and the red dots as the data pionts. In this graph 17 out of 55 movies had Budgets that are bigger than the Arithmetic Mean of 16,450,868 and 38 out of 55 movies had Budgets that are smaller than the Arithmetic Mean. This tells us that even thoough the expenses to creating R-rated Drama movies is incontant, due to its high variability. The majority of the data is skewed more on the inexpensive end, there are way more movies that spent not as much money to produce their films than those who spent alot more to do so. There is a high chance that if anyone who is planning on producing R-rated Drama movies will not have to spend as much mmoney to do so.

What is a Normal Distribution? A Normal Distribution was seen as a normal distribution because early statisticians noticed the same shape of a bell curve coming up over and over again in different distributions, that is why they named it the normal distribution. It is also the most common type of distribution assumed in technical stock market analysis. What is the Empirical Rule and how is it applied to Normal Distributions? The Empirical Rule saya that almost all observed data will fall within three standard deviations from the mean or average.

The Empirical Rules is also referred to as the three-sigma rule or the 68-95-99.7 rule beacuse;
  • 68.3% of the data points should fall between -1 SD and +1 SD from the mean
  • 95.5% of the data points should fall between -2 SD and +2 SD from the mean
  • 99.7% of the data points should fall between -3 SD and +3 SD from the mean

  • The graph above is a scatter plot of the Budget of all the R-rated Drama movies from the dataset. It also has a blue line which is the Arithmetic Mean or CL (Central Line) of the data on the scatter plot, the UCL (Upper Control Limit), the LCL (Lower Control Limit) and the three standard deviations from the mean

    In this scatter plot;
  • 84% of the data points should fall between -1 SD and +1 SD from the mean
  • 94% of the data points should fall between +2 SD from the mean
  • 98% of the data points should fall between +3 SD from the mean
  • 2% of the data points is outside the Upper Control Limit (UCL)

  • This shows that the data is not a normal distribution. This also proves that the budgets on creating R-rated Drama movies is Skewed. Which means the majority of the costing is inexpensive. The data stops at -1 SD, so the lower bound will be -1 SD and the data stops at +3D, however the upper bound will not be +3D it will be +2D because that one data piont that is above the UCL will be considered a data piont that will be avoided at all cost because it is out of control compared to the variation of the data pionts. That data piont that is outside the UCL is a budeget of $100 million that was spent on producig a R-rated Drama Movie. It seems like $100 million will be the number that will be avoided at all cost when producing R-rated Drama Moies (due to it being out side the UCL). The lower bound is $100,000 and the ture upper bound is $61 millon, of the budgeting when producing R-rated Drama Movies. This mean the lowest money that should be spent producing R-rated Drama movies should be $100,000 and the highest budget that should be spent producing R-rated Drama Movies should be $61 million. But why?

    The Distribution of the Budgets
    of all the R-rated movies in the Drama genre.


    X-axis: Cost of R-rated Movies
    Y-axis: Density

    The Variance of the Budgets
    of all the R-rated movies in the Drama genre.


    X-axis: Cost of R-rated Movies
    Y-axis: Density

    The Distribution of the Budgets of all the R-rated Drama movies is not a Standard Normal Distribution. The Emprical Rule or the 68-95-97 rule does not apply to this particular dataset. The Chebyshevs Theorem will be used to break down thw Distribution. The Empirical Rule does not to all data sets, only to those that are bell-shaped, and even then is stated in terms of opproximations. A method that applied yo every data set is known as the Chebyshev's Theorem. The theorem estimates the minimum proportion of obsservations that fall within a specified number of standard devaition from the mean. This theorem applies to a board range of probability distributions. Chebyshev's Theorem helps you determine where the most of the data fall within a distribution of values.

    The Chebyshev's Theorem states;
  • ≥50% of the data points should fall -1 SD and +1 SD from the mean
  • ≥75% of the data points should fall -2 SD and +2 SD from the mean
  • ≥88.9% of the data points should fall -3 SD and +3 SD from the mean
  • At least 89% of the observations fall inside the range of three standard deviations around the mean, and no more than 11% fall outside the range of three standard deviations around the mean.

  • A significant difference with the Chebyshev's Theorem is that it produces minimum and maximum proportions. It uses the words "at least" when giving the proportion of the data which must lie within a given number of standard deviation of the mean; the true proportions found within the indicated regions could be greater than what the theorem gaurantees.

    Based on the Chebyshev's Theorem;
  • On at least 75% (which is 41 obs) of the budgets of all the r-rated drama movies in the data set ranges from $100,000 to $58 million.
  • On at most 25% (which is 14 obs) of the budgets of all the r-rated drama movies in the data set was either between $100,000 and $1 million or between $58 million and $100 million .
  • On at most 13.9% (which is 6 obs) of the budgets of all the r-rated drama movies in the data set ranges from $58 to $80 million.
  • On at most 11% (which is 4 obs) of the budgets of all the r-rated drama movies in the data set are greater than $80 million.
  • The Skewness of the Budgets of
    all the R-rated Movies in the Drama genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Frequency

    The Mean, Median and Mode in relation to the
    Skewness of all the R-rated Movies in the Drama genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Frequency

    The Mean, Median and Mode in relation to the
    Skewness of all the R-rated Movies in the Drama genre.

    X-axis: Theoretical Quantiles
    Y-axis: Ordered Values

    The Kernel Density Estimate on the Skewness of the
    Budgets of all the R-rated movies in the Drama genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Density

    The first graph above is a histogram of the Budgets of all the R-rated Drama movies in this data set with the Pearson's Coefficent of Skewness of the data set. The histogram is a graphed representation of the budgets organized into specified ranges. This histogram condenses the data into ranges and groupings into columns along the horizontal x-axis. The vertical y-axis represents the number count or percentage of occurences in the data for each column. The columns is the visualization of the patterns of the budgets spent on producing R-rated Drama movies. Histrograms are commonly used to demonstrate how many of a certain type of variable occur within a specific range.

    Peadrson's Coefficient of Skewness is a method created by Karl Pearson to indicate if any data set is skewed using the mean and mode of the data set. There are two method, the first is subtracted the mode from the mean and dividing it by the standard deviation. The second method will be used for this analysis.

    The second method of Pearson's Coefficient of Skewness is calculated by multiplying the difference between the mean and median, multiplied by three. Then divide the results by the standard deviation. If the result is a value of zero it means the distribution has no skewness at all, a positive value means the distribution is positively (right) skewed, a negative value means the distribution is negitively (left) skewed. The Pearson's Coefficient of Skewness of the Budget of all the R-rated Drama Movies is 1.07, this number is bigger than zero and it is a positive number whcich means that it is a right skew.

    As you can see the shape of the histrogram is right-skewed histrogram. A right-skewed distribution is asymmetrical, because the budgets of R-rated Drama movies has a natural limit of $0, you can not spend less than $0.01 on producing or creating any movie or product. Due to the natural limit being $0 it prevents the outcome on one side (the nrgative side). due to the natural limit of $0, the ditribution peak is off center toward the limit and a tail stretches away from it making it skewed.

    The mean, median and mode is another way to figure out if there is any skewness in the data, and if the skewness is positively or negatively skewed ditribution.
  • If the mean is greater than the mode or median the distribution is positively skewed.
  • If the mean is leass than the mode or the median the distribution is negetivelty skewed.
  • The secod visulization enphises on the comparison of the mode, median and mean in order to get the skewness. In the graph the mode of the budgets of R-rated Drama movies is the largest value, then the meadian of the budgets of R-rated Drama movies, then the mean of the budgets of R-rated Drama movies. The mean is $16.4 million, the median is $9 million and the mode is $2 million of all the three statistics, the mean is the largest, while the mode is the smallest. Generally if the distribution of the data is skewed to the right, the mode is oftehn less than the median, which is less than the mean. In symmetric distribution, we expect the mean and median to be equal in value. This is significant connection between the shape of the distribution and the relationship with the mean and median.

    The Distribution using the Violin Plot and
    Jittered plot of all the R-rated Movies in the Drama Genre

    X-axis: Cost of R-rated Movies
    Y-axis: Density

    The Distribution using the Violin Plot and
    Swarm plot of all the R-rated Movies in the Drama Genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Density

    The Distribution using the Violin Plot and
    Rug plot of all the R-rated movies in the Drama Genre

    X-axis: Cost of R-rated Movies
    Y-axis: Density

    The Central Tendency using the Violin Plot and
    Box plot of all the R-rated movies in the Drama Genre

    X-axis: Cost of R-rated Movies
    Y-axis: Density

    All five of the graphs above are all violin plot, a violin plot is a hybrid of a box plot and a kernel density plot. It is used to visualize the distribution of numerical data. Box plots only show summary statistics, violin plots depict summary statistics and the density of each variable.

    The Distribution using the Violin Plot and Rug plot of all the R-rated movies in the Drama Genre in this dataset:

    A Rug plot is a plot of points for a single quantitative variable, displayed as dots along just the x-axis or just the y-axis. Like the other plots it is used to visualise the distribution of the datset. It is seen as a one-dimensional scatter plot. Based on the violin KDE and rug plot on the budget of the R-rated Drama movies, there is a higher probabability that memebers of the population who are producing movies will take on the given value of $8 million, as the budget used to create a R-rated movie in the Drama genre. There is a lower probability that $100 million will be spent on producing a R-rated movie in the Drama genre.

    The Distribution using the Violin Plot and Swarm plot of all the R-rated Movies in the Drama Genre in this dataset:

    A swarm plot, also referred as the bee swarm plot, is similar to the strip plot, because they plot all the data points on the graph. The swarm plots strive to prevent onsuring points by calculating non-overlaping positions instead of plotting random overlapping jitter. This arrangement gives them the appearance of a swarm of bees, that is why they are referred as the swarm plot. Based on the KDE and swarm plot of the budget of the R-rated Drama movies, there are roughly 9 groups of 2-3 elements that share the exact value in the budgeting for producing R-rated movies in the Drama genre. About 2/3 of the groups are found between the values of $0 to $10 million. The remaining 1/3 of the groupings are found between $20 million to $40 million and $50 million to $60 million.

    The Distribution using the Violin Plot and Jittered plot of all the R-rated Movies in the Drama Genre in this dataset:
    A jittered strip plot is a variation of the strip plot but with a more vivid visualization of overlapping data points. This is used to visualise the distribution of many individual one-dimensional values. The values are plotted as dots along the x-axis and the dots are then shifted randomly along the y-axis, allowing the data points not to overlap. Based on the KDE and Jittered strip lot of the budget of the R-rated Drama movies, there are 5 main parts of the jittered plot where bodies of data seperate from on another.
  • The first segergated group is between $0 to $10 million.
  • The second segergated group is between $20 million to $25 million.
  • The third segergated group is between $30 million to $40 million.
  • The fourth segergated group is between $50.5 million to $60 million.
  • The last segergated group only has one elemnt whicih is considered an outlier, it has the value of $100 million.

  • The Central Tendency using the Violin Plot and Box plot of all the R-rated movies in the Drama Genre in this dataset:

    A Box plot also refrred as a box and whisker plot, was created to represent the spread and centers of a data set, it shows you how your data is spread out. Measures of spread incude the interquantile range and the mean of the data set . Measures of center include the mean and the median of the data set.

    Reading the Box plot;
    Based on the KDE and Boxplot of the budget of the R-rated Drama movies, the interquantile range is between $0 to $20 million which is a measure of where the bulk of budgets of the R-rated Drama mvoies lie. Where the box lies within the box plot is 50% of the data, based on the box within the boxplot 50% of the data budgets for producing R-rated Drama movies is between $100,000 to $20 million.
    The Minimum: The minimum is the botom of the graph, at the tip of the bottom whisker for the budgets of all the R-rated Drama movies in this data set.
    The Q1, the first quantile: Q1 is represented by the bottom side of the box.
    The Meadian: The median is represented by the horizontial bar in the box of the boxplot.
    The Q3, the third quantile: Q3 is the top edge of the box in the box plot.
    The Maximum: The Maximum would have been the end of the top "whisker". But becaue there are outliers in this data set the maximum is the furthest outlier in this graph.

    Conclusion from the Box plot based on all the Budgets of all the R-rated Drama Movies in the data set;
    The Minimum: The minimum is $100,000.
    The Q1, the first quantile: The Q1 is $2,350,00, which is 25% of the budgets have a value lower than Q1 and 75% of the budgets have a value larger than Q1.
    The Meadian: The meadian is $9 Million which is the center value of the budgets of all the R-rated Drama Movies in the data set.
    The Q3, the third quantile: The Q3 is $20,500,000 75% of the the budgets have a value lower than Q3 and 25% of the budgets have a value larger than Q3.
    The Interquantile Range: The interquantile range is $18,150,000 is the differance between upper and lower quantiles. This is where the majority of the budgets of the R-rated Drama mvoies lie.
    The Maximum: The maximum is $100 Million.
    The Outliers: There are 4 outliers, $100 Million, $61 Million, $60 Million, $55 Million (3 movies share this value)

    Frequency Distribution Table
    of all the R-rated Movies in the Drama Genre

    Cumulative Frequency Distribution Table
    of all the R-rated Movies in the Drama Genre

    Frequency Distribution Table of all the R-rated Movies in the Drama Genre:
    The Frequency (f) of a particular value is the number of times the value occurs in the data. The distribution of a set of elements is the pattern of frequencies, which means the set of all elements in a dataframe with the frequencies associated frequency distribution are protrayed through frequency tables or charts.

    Bt looking at this frequency distribution table, that out of 55 movies produced under the 'R' rating and in the Drama genre, 8 movies spent $2 million each and 8 movies spent 10 million each produing their movies, which is the most popular budget spent on producing a R-rated Drama movies. The second most used budget is 7 movies spent $25 million each producing their movies. This tells us it is more likely to spend between $2 to $20 million producing R-rated Drama movies which is 70% of the data set. It is unlikely to spend between $31 to $100 million producing R-rated Drama movies which is 18% of the data set.

    Cumulative Frequency Distribution Table of all the R-rated Movies in the Drama Genre:
    A cumulative frequency distribution table is a more detailed table than a frequency table. It almost seems the same as a frequency distribution table but it has additional columns that show the cumualative frequency and the cumulative percentage of the data as well. The cumulative frequecny column represents the sum of a class and all the classes below it, cumulative frequency of a value of a variable is the number of values in collection of data leass then or equal to the value of the variable.

    By looking at Cumulative Frequency Distribution Table out of the 55 movies analyzed, 75% of the budgets spent on producing R-rated Drama movies are between $90,000 to $20 million. Ergo it is likely that studios will be spending between $90,000 to $20 million producing an R-rated Drama movie. If we break down the 75% of the data that represents having a budget of $90,000 to $20 million, 55% of the data spent $90,000 to $10 million and 20% of the movies spent $10 million to $20 million producing R-rated Drama movies. Based on the movies that spent between $90,000 to $20 million 73% of the movies are more on the lower end of the specturm, spending $90,000 to $10 million, 37% of the movies are more on the higher end of the specturm spending $10 million to $20 million. Ergo it is likely that studios producing R-rated Drama movies will spend between $90,000 to $10 million and about as likely than not for studios to spend $10 million to $20 million. The expenses for creating R-rated Drama movies learn more on the lower end of the specturm.

    Looking more at the Cumulative Frequency DistributionTable, 96% of the data spent between $90,000 to $60 million producing R-rated Drama movies. If we break down the 96% of the data, 21% of the data is above $20,080,001 and below $60,080,005, so therefore 21% of the data spent $20 million to $60 million producing R-rated Drama movies. Based on the movies that spent $90,000 to $60 million, 22% of the movies spent $20,080,002 to $60,080,005. Ergo it is unlikely to expect the budget of producing R-rated Drama movies to be between $20 million to $60 million.

    100% of the movies spent between $90,000 to $100 million, producing R-rated Drama Movies, 4% of those movies spent $60 million to $100 million. Within that 4% of the data that represents that budgeting of $60 million to $100 million, which is exceptionaly unlikely when producing R-rated Drama movies, 2% spent $60 million to $70 million and 2% spent $90 million to $100 million there is a $30 million gap between those two intervals. This shows that the data shows spread more on the higher specturm. This end of the specturm is the least most expected outcome or it may also be the most avoided decision when choosing how much should be spent when creating R-rated Drama movies. Ergo it is extremely unlikely to expect studios that produce R-rated Drama movies to spend between $60 million to $100 million.

    The Bernoulli Distribution of the Budgets
    of R-rated Drama Movies that are Micro-Budgets

    X-axis: Values of Random Variable X (0,1)
    Y-axis: Probability

    The Bernoulli Distribution of the Budgets
    of R-rated Drama Movies that are Low-Budgets

    X-axis: Values of Random Variable X (0,1)
    Y-axis: Probability

    The Bernoulli Distribution of the Budgets
    of R-rated Drama Movies that are Mid-Budgets

    X-axis: Values of Random Variable X (0,1)
    Y-axis: Probability

    The Bernoulli Distribution of the Budgets
    of R-rated Drama Movies that are High-Budgets

    X-axis: Values of Random Variable X (0,1)
    Y-axis: Probability

    After using the Cumulative Frequency Distribution Table to predict the likelihood of the amount of Budgeting that will be spent creating R-rated Drama Movies. The Bernoulli Distribution will beused to predict what will be expected to be spent and the probability of spending it when creating R-rated Drama Movies. Jacob Bernoulli a Swiss mathematician created the Bernoulli Distribution. The Bernoulli Distribution is a specal case of the Binomial Distribution where a single trial is conducted so that the number of observations is 1. It is a discrete probability distribution with only two possible values for the random variable. The distribution has only two possible outcomes and a single trial. The two possible outcomes in a Bernoulli Distribution are labeled by n=0 and n=1 in which n=1 means success with probability p and n=0 in which n=0 means failure occurs with probability 1-q. The probability mass function (PMF) of a Bernoulli Distribution is defined as: If a trial only has two possible outcomes, "success" and "failure" and if p is the probability of success then- Px(1) = P{X = 1} = p and if (1-p) is the probability of failure then- Px(0) = P{X = 0} = 1 - p.

    The Budgeting for creating movies have 4 categories Micro-Budgeting, Low-Budgeting, Mid-Budgeting, and High-Budgeting. The budgets of the 55 R-rated Drama Movies where put into those four categories. Then the probability mass function of the Bernoulli Distribution was used to predict the probability of spending from each category. The four categories are what is expected to be spent when creating R-rated Drama Movies. The PMF of the Bernoulli Distribution will give us the probability of each category of it actually happening. The four graphs above are the Bernoulli Distribution on each of the 4 categories (micro,low,mid,high) of the Budgets of R-rated Drama Movies.

    The first graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Micro-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a micro-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are micro-budgets, the probability of success (1) is 0.036 and the probability of failure (0) is 0.964. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are micro-budgets is E[X] = p which p=0.036 then E[X] = 0.036.

    The second graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Low-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a low-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are low-budgets, the probability of success (1) is 0.65 and the probability of failure (0) is 0.35. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are low-budgets is E[X] = p which p=0.65 then E[X] = 0.65.

    The third graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Mid-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a mid-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are mid-budgets, the probability of success (1) is 0.18 and the probability of failure (0) is 0.82. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are micro-budgets is E[X] = p which p=0.18 then E[X] = 0.18.

    The fourth graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are High-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a high-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are high-budgets, the probability of success (1) is 0.13 and the probability of failure (0) is 0.87. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are high-budgets is E[X] = p which p=0.13 then E[X] = 0.13.

    The Bernoulli Distribution of on the Sub-groups within the Budgets of R-rated Drama Movies that are Micro-Budgets

    What is a Budget? What does it mean to be on budget? According to internse research and Investopedia a investing website. A budget is an estimation of revenue and expenses over a specified future period of time and is usually compiled and re-evaluated on a periodic basis. Budgets can be made for a person, a group of people, a buisness, a government, or just about anything else that makes and spends money. Being on a budget means not spending more than what was planned on spending. What happens if there is no budget created, not having an effective budgeting will prevent a buisness or organization from spotting problematic areas, in order to prevent this it requires breaking the numbers down to produce a variety of reports that gives you ongoing information. Not having a budget prevents help that determines excessive costs and determine the best ways to maximize profits. Budgeting enables the business to concentrate on cash flow improving profits and increasing returns on investments.

    What is a Movie Budget? According to Filmlifestyle, a movie budget is a document that outlines the costs of production. A movie budget takes care of pre-production to post-production which are all spectrums of filmmaking. It includes elements such as actors salaries, props and wardrobes, food on set, location fees and filming permits. $5 million to over $200 million is the typical range cost for producing a movie. The first step in filmmaking is to determine what type of budget it will need. Because there are different expenses for different genres and within that genre there are different expenses for different system rating. Based on the Bernoulli Distribution we are going to look at the ranges of budgeting in the Drama Genre and particularly looking at R-rated movies within the Drama genre. Now that we know what is expected to be the budget when creating R-rated Drama Movies from the previous analysis now we will contiue off from the previous analysis and talk about the probability of those expected ranges of budgets.

    From a general perseption on filmmaling budgeting, there are 4 main categories when budgeting to produce a movie. Micro-Budget: ranging from $0 to $100,000 , Low-Budgets: ranging from $100,000 to $15 Million, Mid-Budgets: ranging from $15 Million to $50 Million, High-Budgets: ranging from $50 Million+. Previously the budgets of the R-rated Drama Movies were all put into this categories and then the Bernoulli Distribution was excuted. According to the previous graphs and anaylsis using the Bernoulli Distribution Probabaility, there are 4 ranges that will be expected to be the budget when producing R-rated Drama Movies.

    Conclusion: There is a 5.5% chance that the budget for creating R-rated Drama movies will be a Micro-Budget budget ranging from $0 to $100,000. There is a 12.7% chance that the budget for creating R-rated Drama movies will be a High-Budget budget ranging from $50 Million and up. There is a 18.2% chance that the budget for creating R-rated Drama movies will be a Mid-Budget budget ranging from $15 Million to $50 Million. There is a 63.6% chance that the budget for creating R-rated Drama movies will be a Low-Budget budget ranging from$100,000 to $15 Million. Based on our conclusion the category that has the highest probability is the Low-Budget and the category with the lowest probability is the Mirco-Budget. Ergo organizations, buisnesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies will be expecting to spend a Low-Budget Budget with the range of $100,000 to $15 Million and there is a 63.6% chance of that happening.

    The Skewness of the Budgets of
    all the R-rated Movies in the Drama genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Frequency

    The Mean, Median and Mode in relation to the
    Skewness of all the R-rated Movies in the Drama genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Frequency

    The Kernel Density Estimate on the Skewness of the
    Budgets of all the R-rated movies in the Drama genre.

    X-axis: Cost of R-rated Movies
    Y-axis: Density
    The previous analysis was explaining the caregories within the Budgets for producing R-rated Drama movies. The categoties were Micro-Budgets, Low-Budgets, Mid-Budgets and High-Budgets. It also used the Bernoulli Distribution to explain which category had the highest probability and which had the lowest. As we can see the Low-Budget category had the highets probability of being the budget range that will be spent when creating R-rated Drama movies. Just to recap on the ranges of each catecgory; Micro-Budget: ranging from $0 to $100,000 , Low-Budgets: ranging from $100,000 to $15 Million, Mid-Budgets: ranging from $15 Million to $50 Million, High-Budgets: ranging from $50 Million+. The micro-budget category had the lowest probability of being the budget range that will be spent when creating R-rated Drama Movies. Now that we have established the four different categories within the budgeting for creating R-rated Drama Movies and the probability of them happening using the Bernoulli Distribution. The Bernoulli Distribution will also be used to identify the sub-groups within those four categories and identify the probability of those sub-groups happening within each category. There will be two probabilities for each sub-groups in each category, the first one is the probability of it happening compared to the specfic category it is in and the secodn on is the probabaility of it happening compared to the entire data set. The micro-budget category is to small of a range to have sub-groups, it doeas not have ny sub-groups it only has two values with one being $100,000 and the other being $155,000. However the two values probability will be compared to the rest of the data set.

    Now that the approaches to this analysis has been establish, and the four categories has been explained. We will now start exploring the sub-groups with the Low-Budget, Mid-budget and High-Budget categories.

    The Low-Budget Category:
    The Low-Budget category has 3 sub-groups of ranges of budgets that are spent producing R-rated Drama Movies. The frist sub-group ranges from $1 Million to $5 Million. The second sub-group ranges from $5 Million to $10 Million. The third sub-group ranges from $10 Million to $15 Million. Within the Low-Budget category the Bernoulli Distribution was used to get the probabaility of each sub-group happening in this category and the rest of the dataset. Within the Low-Budget category the first sub-group with a range of $1 Million to $5 Million has a 57% chance of being the budget when the budget is a Low-Budget budget. It also has a 36.3% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset. Within the Low-Budget category the second sub-group with a range of $5 Million to $10 Million has a 20% chance of being the budget when the budget is a Low-Budget budget. It also has a 12.7% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset. Within the Low-Budget category the third sub-group with a range of $10 Million to $15 Million has a 23% chance of being the budget when the budget is a Low-Budget budget. It also has a 14.5% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset.

    The Mid-Budget Category:
    The Mid-Budget category has 3 sub-groups of ranges of budgets that are spent producing R-rated Drama Movies. The frist sub-group ranges from $15 Million to $20 Million. The second sub-group ranges from $20 Million to $30 Million. The third sub-group ranges from $30 Million to $50 Million. Within the Mid-Budget category the Bernoulli Distribution was used to get the probabaility of each sub-group happening in this category and the rest of the dataset. Within the Mid-Budget category the first sub-group with a range of $15 Million to $20 Million has a 30% chance of being the budget when the budget is a Mid-Budget budget. It also has a 5.5% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset. Within the Mid-Budget category the second sub-group with a range of $20 Million to $30 Million has a 40% chance of being the budget when the budget is a Mid-Budget budget. It also has a 7.2% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset. Within the Mid-Budget category the third sub-group with a range of $30 Million to $50 Million has a 30% chance of being the budget when the budget is a Mid-Budget budget. It also has a 5.5% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset.

    The High-Budget Category:
    The High-Budget category has 3 sub-groups of ranges of budgets that are spent producing R-rated Drama Movies. The frist sub-group ranges from $50 Million to $60 Million . The second sub-group ranges from $60 Million to $70 Million. The third sub-group ranges from $90 Million to $100 Million. Within the High-Budget category the Bernoulli Distribution was used to get the probabaility of each sub-group happening in this category and the rest of the dataset. Within the High-Budget category the first sub-group with a range of $50 Million to $60 Million has a 72% chance of being the budget when the budget is a Mid-Budget budget. It also has a 9% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset. Within the High-Budget category the second sub-group with a range of $60 Million to $70 Million has a 14% chance of being the budget when the budget is a High-Budget budget. It also has a 1.8% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset. Within the High-Budget category the third sub-group with a range of $90 Million to $100 Million has a 14% chance of being the budget when the budget is a High-Budget budget. It also has a 1.8% chance of being the budget spent to produce R-rated Drama movies compared to the rest of the dataset.

    Conclusion Part A: The second sub-group in the Low-Budget category with the range of $5 Million to 10 Million has the lowest probability within the Low-Budget category. The first sub-group in the Low-Budget category with the range of $1 Million to $5 Million has the highest probaaility within the Low-Budget category. Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with a Low-Budget budget will be expecting to spend between $1 Million to $5 Million and there is a 57% chance of that happening.

    The first and third sub-groups in the Mid-Budget category with the range of $15 Million to $20 Million and the range of $30 Million to $50 Million has the exact probability which is the lwoest probability within the Mid-Budget category. The second sub-group in the Mid-Budget category with the range of $20 Million to $30 Million has the highest probaaility within the Mid-Budget category. Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with a Mid-Budget budget will be expecting to spend between $20 Million to $30 Million and there is a 40% chance of that happening.

    The second and third sub-groups in the High-Budget category with the range of $60 Million to $70 Million and the range of $90 Million to $100 Million has the exact probability which is the lowest probability within the High-Budget category. The first sub-group in the High-Budget category with the range of $50 Million to $60 Million has the highest probaaility within the High-Budget category. Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with a High-Budget budget will be expecting to spend between $50 Million to $60 Million and there is a 72% chance of that happening.

    Conclusion Part B: The sub-group that has the lowest probabaility within the entire budgets of R-rated Drama Movies in this data set is the sub-group with the range of $60-$70 Million with the probability of 1.8% and the sub-group woth the range of $90-$100 Million with the probability of 1.8%. The two sub-groups have the same exact probability and are both from the High-Budget category. The sub-group with the highest probability within the entire budgets of R-rated Drama Movies in this data set is the sub-group with therange of $1-$5 Million with the probability of 36.3%, this sub-group is from the Low-Budget category.Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with will have a budget between $1 Million to $5 Million and there is a 36.6% chance of that happening.

    In [502]:
    #<center><img  src="a1.png" style="width:23%"> <img src="a2.png" style="width:23%"> <img src="a3.png" style="width:23%"></center>
    #<center><h3 style='color:#fbec5d'>System Rating R:</h3></center><p>The average RIO in system R rating is 21.10. Twenty movies in this rating are above the average, 
    #which means 56% of the entire system R rating is above 21.10. The 3rd quantile of the ROI in this system rating is 33.40 meaning this is the most highest RIO, seven movies is above that making it 23% of the entire system R rating  
    #<span style='color:#fbec5d'>For every dollar spent how mich did each movie make?.</span> This statement is assuming that if they were to all have the same budget which is a dollar, how much would each movie gnerate. 
    #This objective is to cognize which movie is the most effcinet in each sysem rating and whcih susyem rating is the most efficient.
    

    System Rating G:

    The average RIO in system G rating is 17.13 dollars. Twelve movies in this rating are above the average, which means 44% of the entire system G rating is above 17.13 dollars. The 3rd quantile of the RIO in this system rating is 22.05 dollars meaning this is the most highest RIO, six movies is above that making it 22% of the entire system G rating

    System Rating PG:

    The average RIO in system PG rating is 21.13 dollars. Tweleve movies in this rating are above the average, which means 44% of the entire system PG rating is above 21.13 dollars. The 3rd quantile of the RIO in this system rating is 31.70 dollars meaning this is the most highest RIO, seven movies is above that making it 26% of the entire system R rating

    System Rating PG-13:

    The average RIO in system PG-13 rating is 20.94 dollars. Thirsten movies in the rating are above the average, which means 48% of the entire system PG-13 rating is above 20.94 dollars. The 3rd quantile of the RIO in this system rating is 26.45 dollars meaning this is the higest RIO, nine movies is above that making it 33% of the entire system PG-13 rating

    System Rating NR:

    The average RIO in system NR rating is 26.90 dollars. Sixtee movies in the rating are above the average, which means 60% of the entire system NR rating is above 26.90 dollars. The 3rd quantile of the RIO in this system rating is 38.05 dollars meaning this is the hisgest RIO, eight movies is above that making it 30%of the entire system NR rating

    Net Profit Margin¶

    This is the blueprint for creating the second visualiztion Gross Profit Margin Percentage, altair will be used to create this graph.

    Blueprint:

      1. Understanding the dataframe format used to create this data visualiztion helps make the process easier, these are the key componets that make up the dataframe used for this graph.
      • The data frame that altair used to create this graph consist of four colunms;

        • Yield: This colunm is the only colunm that shows any type of informations that is in numbers
        • Variety: This colunm is the only column that contains category or groupings among the other pionts in the dataframe
        • Year: This colunm is based on datasets that contains timeframes
        • Site: This is a colunm that pinpionts differiantion among the rest of the data pionts
      1. Create five dataframes for five different graphs based on the documented format that is supplied by Altair.
      • The dataframes will consist of these six columns;

        • Name: This colunm shows the names of each movie
        • Type: The concept of this graph portrays the profit made compared to the Revenue, the two main elements that reflects that is the Revenue and the Profit ergo the 'Type' column are made up of those two elements.
        • Yield: This colunm shows how much the 'Type' column is in int, this is used to plot the pionts in the graph.
        • Type Yield: This colunm shows the 'Yield' colunm in currency which is a string, this is mostly used for the tooltip to make it easier to understand how much the elements in the 'Type' column is in dollars.
        • System Rating: This shows the System Rating of each movie.
        • Gross Profit Margin: This colunm shows the gross profit margin in percentage.
      1. The style of this graph is a Becker's Barkey Trellis Plot which is found in Altair's Gallery. There are two pionts shaped as a ring in each panel, the first ring projects the Renvue and the other ring projects the Profit, this set up allows the perseption of how much of a fraction do each movie actually walk away with when compared to the Revenue allowing the Profit to be seen from a differnt angle. This could make the movie look more profitable or seem over-rated. Each ring when hovered shows the Name of the movie, the Type choosing beteween Profit or Revenue, the Type Yield which is the amount of the Type in dollars, the Gross Profit Margin Percentage and the System Rating. The closer the Profit is to the Revenue the more buoyant it is!

    The is the 'Drama_DataFrame' dataframe. (this dataframe is interactive)

    In [16]:
    Drama_DataFrame
    
    Out[16]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    Loading... (need help?)

    These are the variables needed to create the columns for the first dataframe: 'df1'

    Getting all the Names of the movies that are R-rated from the 'Drama_DataFrame' dataframe.

    In [322]:
    name = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='R'and Drama_DataFrame.Profit[i] > 0:
                name.append(Drama_DataFrame.Movie[i])
    print(name)
    
    ['Django Unchained', 'Gone Girl', 'Priest', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Crimson Peak', 'Zero Dark Thirty', 'Fifty Shades of Grey', 'The Master', 'Flight', 'The Ides of March', 'Nocturnal Animals', 'The Water Diviner', 'For Colored Girls', 'The Debt', 'Let Me In', 'Black Swan', 'Ex Machina', 'Room', 'If Beale Street Could Talk', 'Arbitrage', 'Stoker', 'Carol', 'Quartet', 'Hereditary', 'Melancholia', 'Manchester by the Sea', 'We Need to Talk About Kevin', 'Addicted', 'Mommy', 'Take Shelter', 'Boyhood', 'The Witch', 'Margin Call', 'Whiplash', 'Before Midnight', 'Silent House', "Winter's Bone", 'The Florida Project', 'We Are Your Friends', 'Locke', 'Knock Knock', 'Buried', 'Unsane', 'Blue Valentine', 'Martha Marcy May Marlene', 'Palo Alto', 'Sound of My Voice', 'A Ghost Story', 'Ordinary People', 'Fame', 'Endless Love', 'Ghost Story', 'Zoot Suit', 'Rich and Famous', 'Raggedy Man']
    

    Getting all the Worldwide Revenue in Dollars of the movies that are R-rated from the 'Drama_DataFrame' dataframe.

    In [354]:
    world_cur= []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='R'and Drama_DataFrame.Profit[i] > 0:
                world_cur.append(Drama_DataFrame.Worldwide_Gross_x[i])
    print(world_cur)
    
    ['$449,948,323', '$368,567,189', '$84,154,026', '$381,398,492', '$371,350,619', '$74,966,854', '$134,612,435', '$570,998,101', '$50,647,416', '$160,558,438', '$77,735,925', '$32,398,681', '$31,054,727', '$38,017,873', '$46,604,054', '$28,270,399', '$331,266,710', '$38,358,392', '$36,262,783', '$19,859,167', '$35,830,713', '$12,034,913', '$42,843,521', '$56,178,935', '$70,133,905', '$21,817,298', '$77,733,867', '$10,765,283', '$17,499,242', '$17,536,004', '$4,972,016', '$57,273,049', '$40,454,520', '$20,433,227', '$38,969,037', '$23,251,930', '$16,610,760', '$16,131,551', '$11,295,324', '$10,153,415', '$2,088,390', '$6,328,516', '$21,270,290', '$14,244,931', '$16,566,240', '$5,438,911', '$1,156,309', '$429,448', '$2,769,782', '$54,766,923', '$77,211,836', '$34,718,173', '$1,951,683', '$3,256,082', '$13,000,000', '$11,000,000']
    

    Getting all the Worldwide Revenue in Integer of the movies that are R-rated from the 'Drama_DataFrame' dataframe.

    In [355]:
    world_int = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='R'and Drama_DataFrame.Profit[i] > 0:
                world_int.append(Drama_DataFrame.Worldwide_Gross[i])
    print(world_int)
    
    [449948323, 368567189, 84154026, 381398492, 371350619, 74966854, 134612435, 570998101, 50647416, 160558438, 77735925, 32398681, 31054727, 38017873, 46604054, 28270399, 331266710, 38358392, 36262783, 19859167, 35830713, 12034913, 42843521, 56178935, 70133905, 21817298, 77733867, 10765283, 17499242, 17536004, 4972016, 57273049, 40454520, 20433227, 38969037, 23251930, 16610760, 16131551, 11295324, 10153415, 2088390, 6328516, 21270290, 14244931, 16566240, 5438911, 1156309, 429448, 2769782, 54766923, 77211836, 34718173, 1951683, 3256082, 13000000, 11000000]
    

    Getting all the Profit in Dollars of the movies that are R-rated from the 'Drama_DataFrame' dataframe.

    In [356]:
    profit_cur = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='R'and Drama_DataFrame.Profit[i] > 0:
                profit_cur.append(Drama_DataFrame.Profit_x[i])
    print(profit_cur)
    
    ['$349,948,323', '$307,567,189', '$24,154,026', '$326,398,492', '$316,350,619', '$19,966,854', '$82,112,435', '$530,998,101', '$13,147,416', '$129,558,438', '$54,735,925', '$9,898,681', '$8,554,727', '$17,017,873', '$26,604,054', '$8,270,399', '$318,266,710', '$25,358,392', '$23,262,783', '$7,859,167', '$23,830,713', '$34,913', '$31,043,521', '$45,178,935', '$60,133,905', '$12,417,298', '$69,233,867', '$3,765,283', '$12,499,242', '$12,636,004', '$222,016', '$53,273,049', '$36,954,520', '$17,033,227', '$35,669,037', '$20,251,930', '$14,610,760', '$14,131,551', '$9,295,324', '$8,153,415', '$88,390', '$4,328,516', '$19,282,640', '$12,744,931', '$15,566,240', '$4,438,911', '$156,309', '$294,448', '$2,669,782', '$48,766,923', '$68,711,836', '$14,718,173', '$1,851,683', '$556,082', '$1,500,000', '$2,000,000']
    

    Getting all the Profit in Integer of the movies that are R-rated from the 'Drama_DataFrame' dataframe.

    In [73]:
    profit_int = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x == 'R' and Drama_DataFrame.Profit[i] > 0:
                profit_int.append(int(Drama_DataFrame.Profit[i]))
    print(profit_int)
    
    [349948323, 307567189, 24154026, 326398492, 316350619, 19966854, 82112435, 530998101, 13147416, 129558438, 54735925, 9898681, 8554727, 17017873, 26604054, 8270399, 318266710, 25358392, 23262783, 7859167, 23830713, 34913, 31043521, 45178935, 60133905, 12417298, 69233867, 3765283, 12499242, 12636004, 222016, 53273049, 36954520, 17033227, 35669037, 20251930, 14610760, 14131551, 9295324, 8153415, 88390, 4328516, 19282640, 12744931, 15566240, 4438911, 156309, 294448, 2669782, 48766923, 68711836, 14718173, 1851683, 556082, 1500000, 2000000]
    

    Creating a list consisting of 'R' repeated 56 times for the R-rated category due to it having 56 movies for the new dataframe that will be created below.

    In [358]:
    size = list('R'*56)
    print(size)
    
    ['R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R']
    

    Getting all the Net Profit Margin of the movies that are R-rated from the 'Drama_DataFrame' dataframe.

    In [359]:
    npm = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x == 'R' and Drama_DataFrame.Profit[i] > 0:
                npm.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(npm)
    
    [77, 83, 28, 85, 85, 26, 60, 92, 25, 80, 70, 30, 27, 44, 57, 29, 96, 66, 64, 39, 66, 0, 72, 80, 85, 56, 89, 34, 71, 72, 4, 93, 91, 83, 91, 87, 87, 87, 82, 80, 4, 68, 90, 89, 93, 81, 13, 68, 96, 89, 88, 42, 94, 17, 11, 18]
    

    Converting the list consisting of Net Profit Margin of all the R-rated movies from integer to percentage.

    In [360]:
    npm_percent = []
    for i in npm:
        npm_percent.append("{:}%".format(i))
    print(npm_percent)
    
    ['77%', '83%', '28%', '85%', '85%', '26%', '60%', '92%', '25%', '80%', '70%', '30%', '27%', '44%', '57%', '29%', '96%', '66%', '64%', '39%', '66%', '0%', '72%', '80%', '85%', '56%', '89%', '34%', '71%', '72%', '4%', '93%', '91%', '83%', '91%', '87%', '87%', '87%', '82%', '80%', '4%', '68%', '90%', '89%', '93%', '81%', '13%', '68%', '96%', '89%', '88%', '42%', '94%', '17%', '11%', '18%']
    

    Creating a list of consisting of 'Revenue' repeated 56 times and 'Profit' repeated 56 times for the R-rated category due to it having 56 movies for the new dataframe that will be created below.

    In [361]:
    r_rate = []
    for i in list(range(56)):
        r_rate.append('Revenue')
    for i in list(range(56)):
        r_rate.append('Profit')
    print(r_rate)
    
    ['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
    

    These are the variables needed to create the columns for the second dataframe: 'df2'

    Getting all the Names of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.

    In [321]:
    name1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG'and Drama_DataFrame.Profit[i] > 0:
                name1.append(Drama_DataFrame.Movie[i])
    print(name1)
    
    ['Hugo', 'Dolphin Tale', 'Wonder', 'The Last Song', 'War Room', 'The Lunchbox', 'Somewhere in Time', 'Urban Cowboy', 'Cinderella', 'War Room', 'Wonder', 'Little Women', 'Overcomer', 'The Jazz Singer', 'A Walk to Remember', 'Tuck Everlasting', 'Dreamer', 'The Lake House', 'Akeelah and the Bee', 'Bridge to Terabithia', 'August Rush', 'Fireproof', 'The Last Song', "God's Not Dead", "Mr. Holland's Opus", 'Phenomenon', 'Contact', 'The Spanish Prisoner', 'Sense and Sensibility', 'The Secret of Roan Inish', 'The Remains of the Day', 'Pure Country', 'Forever Young', 'A River Runs Through It', 'Honeysuckle Rose', 'Resurrection', 'Taps', 'On Golden Pond', 'Absence of Malice', 'The Night the Lights Went Out in Georgia', 'Rocky III', 'Tex', 'Staying Alive', 'Tender Mercies', 'Footloose', 'The Natural']
    

    Getting all the Worldwide Revenue in Dollars of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.

    In [363]:
    world_cur1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG'and Drama_DataFrame.Profit[i] > 0:
                world_cur1.append(Drama_DataFrame.Worldwide_Gross_x[i])
    print(world_cur1)
    
    ['$180,047,784', '$96,068,724', '$304,604,712', '$92,678,948', '$73,975,239', '$12,231,500', '$9,709,597', '$46,918,287', '$542,351,353', '$73,986,904', '$305,937,718', '$216,601,214', '$38,102,988', '$27,118,000', '$47,494,916', '$19,344,615', '$38,741,732', '$114,830,111', '$18,948,425', '$137,587,063', '$64,605,762', '$33,473,297', '$89,137,047', '$64,667,874', '$106,269,971', '$152,036,382', '$171,120,329', '$13,835,130', '$134,582,776', '$6,101,815', '$63,954,968', '$15,164,458', '$127,956,187', '$43,440,294', '$17,815,212', '$157,297,525', '$35,856,053', '$119,285,432', '$40,716,963', '$14,923,752', '$125,052,686', '$549,368,315', '$64,892,670', '$8,443,124', '$80,008,942', '$48,000,000']
    

    Getting all the Worldwide Revenue in Integer of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.

    In [364]:
    world_int1= []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG'and Drama_DataFrame.Profit[i] > 0:
                world_int1.append(Drama_DataFrame.Worldwide_Gross[i])
    print(world_int1)
    
    [180047784, 96068724, 304604712, 92678948, 73975239, 12231500, 9709597, 46918287, 542351353, 73986904, 305937718, 216601214, 38102988, 27118000, 47494916, 19344615, 38741732, 114830111, 18948425, 137587063, 64605762, 33473297, 89137047, 64667874, 106269971, 152036382, 171120329, 13835130, 134582776, 6101815, 63954968, 15164458, 127956187, 43440294, 17815212, 157297525, 35856053, 119285432, 40716963, 14923752, 125052686, 549368315, 64892670, 8443124, 80008942, 48000000]
    

    Getting all the Profit in Dollars of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.

    In [365]:
    profit_cur1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG'and Drama_DataFrame.Profit[i] > 0:
                profit_cur1.append(Drama_DataFrame.Profit_x[i])
    print(profit_cur1)
    
    ['$47,784', '$59,068,724', '$284,604,712', '$72,678,948', '$70,975,239', '$10,531,500', '$4,609,597', '$36,918,287', '$447,351,353', '$70,986,904', '$285,937,718', '$176,601,214', '$33,102,988', '$26,696,000', '$35,694,916', '$4,344,615', '$6,741,732', '$74,830,111', '$10,948,425', '$120,587,063', '$34,605,762', '$32,973,297', '$69,137,047', '$62,667,874', '$83,269,971', '$120,036,382', '$81,120,329', '$3,835,130', '$118,582,776', '$3,101,815', '$48,954,968', '$5,164,458', '$107,956,187', '$31,440,294', '$12,815,212', '$150,297,525', '$21,856,053', '$104,285,432', '$28,716,963', '$7,423,752', '$108,052,686', '$544,368,315', '$42,892,670', '$3,943,124', '$71,808,942', '$20,000,000']
    

    Getting all the Profit in Integer of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.

    In [72]:
    profit_int1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG'and Drama_DataFrame.Profit[i] > 0:
                profit_int1.append(int(Drama_DataFrame.Profit[i]))
    print(profit_int1)
    
    [47784, 59068724, 284604712, 72678948, 70975239, 10531500, 4609597, 36918287, 447351353, 70986904, 285937718, 176601214, 33102988, 26696000, 35694916, 4344615, 6741732, 74830111, 10948425, 120587063, 34605762, 32973297, 69137047, 62667874, 83269971, 120036382, 81120329, 3835130, 118582776, 3101815, 48954968, 5164458, 107956187, 31440294, 12815212, 150297525, 21856053, 104285432, 28716963, 7423752, 108052686, 544368315, 42892670, 3943124, 71808942, 20000000]
    

    Creating a list consisting of 'PG' repeated 56 times for the PG-rated category due to it having 46 movies for the new dataframe that will be created below.

    In [367]:
    size_1 = []
    for i in list(range(46)):
        size_1.append('PG')
    print(size_1)
    
    ['PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG']
    

    Getting all the Net Profit Margin of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.

    In [368]:
    npm1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'PG'and Drama_DataFrame.Profit[i] > 0:
            npm1.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(npm1)
    
    [0, 61, 93, 78, 95, 86, 47, 78, 82, 95, 93, 81, 86, 98, 75, 22, 17, 65, 57, 87, 53, 98, 77, 96, 78, 78, 47, 27, 88, 50, 76, 34, 84, 72, 71, 95, 60, 87, 70, 49, 86, 99, 66, 46, 89, 41]
    

    Converting the list consisting of Net Profit Margin of all the PG-rated movies from integer to percentage.

    In [369]:
    npm1_percent = []
    for i in npm1:
        npm1_percent.append("{:}%".format(i))
    print(npm1_percent)
    
    ['0%', '61%', '93%', '78%', '95%', '86%', '47%', '78%', '82%', '95%', '93%', '81%', '86%', '98%', '75%', '22%', '17%', '65%', '57%', '87%', '53%', '98%', '77%', '96%', '78%', '78%', '47%', '27%', '88%', '50%', '76%', '34%', '84%', '72%', '71%', '95%', '60%', '87%', '70%', '49%', '86%', '99%', '66%', '46%', '89%', '41%']
    

    Creating a list of consisting of 'Revenue' repeated 46 times and 'Profit' repeated 46 times for the PG-rated category due to it having 46 movies for the new dataframe that will be created below.

    In [370]:
    pg_rate = []
    for i in list(range(46)):
        pg_rate.append('Revenue')
    for i in list(range(46)):
        pg_rate.append('Profit')
    print(pg_rate)
    
    ['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
    

    These are the variables needed to create the columns for the third dataframe: 'df3'

    Getting all the Names of the movies that are G-rated from the 'Drama_DataFrame' dataframe.

    In [326]:
    name2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='G' and Drama_DataFrame.Profit[i] > 0:
                name2.append(Drama_DataFrame.Movie[i])
    print(name2)
    
    ['A Sunday in the Country', 'Prancer', 'The Rookie', 'Beauty and the Beast 1991', 'The Little Rascals', 'Ramona and Beezus', 'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna', 'Lassie Come Home', "Charlotte's Web", 'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music', 'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', "Hachiko: A Dog's Story", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Three Cions in the Fountain']
    

    Getting all the Worldwide Revenue in Dollars of the movies that are G-rated from the 'Drama_DataFrame' dataframe.

    In [372]:
    world_cur2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='G'and Drama_DataFrame.Profit[i] > 0:
                world_cur2.append(Drama_DataFrame.Worldwide_Gross_x[i])
    print(world_cur2)
    
    ['$2,411,143', '$18,587,135', '$80,693,537', '$438,656,843', '$66,947,950', '$27,469,621', '$37,799,643', '$325,500,000', '$246,100,000', '$3,750,000', '$4,517,000', '$143,985,708', '$17,657,973', '$80,491,516', '$311,281,000', '$286,214,195', '$90,482,317', '$986,214,868', '$268,000,000', '$72,071,636', '$47,707,417', '$30,194,409', '$65,500,000', '$7,600,377', '$12,000,000']
    

    Getting all the Worldwide Revenue in Integer of the movies that are G-rated from the 'Drama_DataFrame' dataframe.

    In [373]:
    world_int2= []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='G' and Drama_DataFrame.Profit[i] > 0:
                world_int2.append(Drama_DataFrame.Worldwide_Gross[i])
    print(world_int2)
    
    [2411143, 18587135, 80693537, 438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 4517000, 143985708, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868, 268000000, 72071636, 47707417, 30194409, 65500000, 7600377, 12000000]
    

    Getting all the Profit in Dollars of the movies that are G-rated from the 'Drama_DataFrame' dataframe.

    In [374]:
    profit_cur2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='G' and Drama_DataFrame.Profit[i] > 0:
                profit_cur2.append(Drama_DataFrame.Profit_x[i])
    print(profit_cur2)
    
    ['$1,711,143', '$11,587,135', '$58,693,537', '$418,656,843', '$43,947,950', '$12,469,621', '$35,099,643', '$255,500,000', '$216,100,000', '$1,250,000', '$3,851,000', '$58,985,708', '$7,657,973', '$58,491,516', '$293,281,000', '$278,014,195', '$30,482,317', '$941,214,868', '$267,142,000', '$55,071,636', '$37,707,417', '$23,794,409', '$52,500,000', '$5,850,377', '$10,300,000']
    

    Getting all the Profit in Integer of the movies that are G-rated from the 'Drama_DataFrame' dataframe.

    In [71]:
    profit_int2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='G' and Drama_DataFrame.Profit[i] > 0:
                profit_int2.append(int(Drama_DataFrame.Profit[i]))
    print(profit_int2)
    
    [1711143, 11587135, 58693537, 418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 1250000, 3851000, 58985708, 7657973, 58491516, 293281000, 278014195, 30482317, 941214868, 267142000, 55071636, 37707417, 23794409, 52500000, 5850377, 10300000]
    

    Creating a list consisting of 'G' repeated 25 times for the G-rated category due to it having 25 movies for the new dataframe that will be created below.

    In [376]:
    size_2 = list('G'*25);print(size_2)
    
    ['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G']
    

    Getting all the Net Profit Margin of the movies that are G-rated from the 'Drama_DataFrame' dataframe.

    In [377]:
    npm2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'G' and Drama_DataFrame.Profit[i] > 0:
            npm2.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(npm2)
    
    [70, 62, 72, 95, 65, 45, 92, 78, 87, 33, 85, 40, 43, 72, 94, 97, 33, 95, 99, 76, 79, 78, 80, 76, 85]
    

    Converting the list consisting of Net Profit Margin of all the G-rated movies from integer to percentage.

    In [378]:
    npm2_percent = []
    for i in npm2:
        npm2_percent.append("{:}%".format(i))
    print(npm2_percent)
    
    ['70%', '62%', '72%', '95%', '65%', '45%', '92%', '78%', '87%', '33%', '85%', '40%', '43%', '72%', '94%', '97%', '33%', '95%', '99%', '76%', '79%', '78%', '80%', '76%', '85%']
    

    Creating a list of consisting of 'Revenue' repeated 25 times and 'Profit' repeated 25 times for the G-rated category due to it having 25 movies for the new dataframe that will be created below.

    In [379]:
    g_rate = []
    for i in list(range(25)):
        g_rate.append('Revenue')
    for i in list(range(25)):
        g_rate.append('Profit')
    print(g_rate)
    
    ['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
    

    These are the variables needed to create the columns for the fourth dataframe: 'df4'

    Getting all the Names of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.

    In [320]:
    name3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
                name3.append(Drama_DataFrame.Movie[i])
    print(name3)
    
    ['Gravity', 'Sing', 'Contagion', 'Burlesque', 'Creed II', 'The Post', 'Hereafter', 'Anna Karenina', 'Arrival', 'Charlie St. Cloud', 'Bridge of Spies', 'The Impossible', 'Water for Elephants', 'Creed', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Tree of Life', 'The Longest Ride', 'Step Up Revolution', 'The Vow', 'The Age of Adaline', 'Safe Haven', 'The Best of Me', 'The Help', 'Dear John', 'The Lucky One', 'The Giver', 'Draft Day', 'Rings', 'Fences', 'Me Before You', 'The Light Between Oceans', 'The Book Thief', 'A Quiet Place', 'Beastly', 'The Roommate', 'Remember Me', 'The Woman in Black', 'Country Strong', 'One Day', 'Suffragette', 'The Perks of Being a Wallflower', 'Project Almanac', 'Wish Upon', 'If I Stay', 'Brooklyn', 'Everything, Everything', 'Mud', 'Amour', 'Ouija: Origin of Evil', 'Black or White', 'The Bye Bye Man', 'Gifted', 'The Words', 'Lights Out', 'Still Alice', 'Before I Fall', 'Rabbit Hole', 'Ida', 'Courageous', 'Mustang', 'Like Crazy', 'Another Earth']
    

    Getting all the Worldwide Revenue in Dollars of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.

    In [381]:
    world_cur3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
                world_cur3.append(Drama_DataFrame.Worldwide_Gross_x[i])
    
    print(world_cur3)
    
    ['$693,698,673', '$634,454,789', '$137,551,594', '$90,552,675', '$213,591,522', '$179,748,880', '$108,660,270', '$71,004,627', '$203,127,894', '$48,478,084', '$162,498,338', '$169,590,606', '$116,809,717', '$173,567,581', '$97,143,987', '$85,309,093', '$252,276,928', '$61,721,826', '$63,802,928', '$165,552,290', '$197,618,160', '$68,984,536', '$94,050,951', '$41,059,418', '$213,120,004', '$142,033,509', '$96,633,833', '$66,540,205', '$29,847,480', '$82,917,283', '$64,282,881', '$208,265,198', '$22,281,732', '$76,086,711', '$334,522,294', '$38,028,230', '$52,545,707', '$56,506,120', '$128,955,898', '$20,601,987', '$59,168,692', '$34,044,909', '$33,069,303', '$32,909,437', '$23,477,345', '$78,356,170', '$62,076,141', '$61,603,136', '$31,556,959', '$36,787,044', '$81,831,866', '$21,971,021', '$31,187,727', '$36,964,656', '$16,369,708', '$148,806,510', '$41,699,612', '$18,945,682', '$6,205,034', '$15,298,355', '$35,185,884', '$5,552,584', '$3,728,400', '$2,102,779']
    

    Getting all the Worldwide Revenue in Integer of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.

    In [382]:
    world_int3= []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
                 world_int3.append(Drama_DataFrame.Worldwide_Gross[i])
    
    print(world_int3)
    
    [693698673, 634454789, 137551594, 90552675, 213591522, 179748880, 108660270, 71004627, 203127894, 48478084, 162498338, 169590606, 116809717, 173567581, 97143987, 85309093, 252276928, 61721826, 63802928, 165552290, 197618160, 68984536, 94050951, 41059418, 213120004, 142033509, 96633833, 66540205, 29847480, 82917283, 64282881, 208265198, 22281732, 76086711, 334522294, 38028230, 52545707, 56506120, 128955898, 20601987, 59168692, 34044909, 33069303, 32909437, 23477345, 78356170, 62076141, 61603136, 31556959, 36787044, 81831866, 21971021, 31187727, 36964656, 16369708, 148806510, 41699612, 18945682, 6205034, 15298355, 35185884, 5552584, 3728400, 2102779]
    

    Getting all the Profit in Dollars of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.

    In [383]:
    profit_cur3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
                profit_cur3.append(Drama_DataFrame.Profit_x[i])
    
    print(profit_cur3)
    
    ['$583,698,673', '$559,454,789', '$77,551,594', '$35,552,675', '$163,591,522', '$129,748,880', '$58,660,270', '$22,004,627', '$156,127,894', '$4,478,084', '$122,498,338', '$129,590,606', '$78,809,717', '$136,567,581', '$60,143,987', '$49,309,093', '$217,276,928', '$26,721,826', '$29,802,928', '$132,552,290', '$167,618,160', '$38,984,536', '$66,050,951', '$15,059,418', '$188,120,004', '$117,033,509', '$71,633,833', '$41,540,205', '$4,847,480', '$57,917,283', '$40,282,881', '$188,265,198', '$2,281,732', '$57,086,711', '$317,522,294', '$21,028,230', '$36,545,707', '$40,506,120', '$113,955,898', '$5,601,987', '$44,168,692', '$20,044,909', '$20,069,303', '$20,909,437', '$11,477,345', '$67,356,170', '$51,076,141', '$51,603,136', '$21,556,959', '$27,087,044', '$72,831,866', '$12,971,021', '$23,787,727', '$29,964,656', '$10,369,708', '$143,806,510', '$36,699,612', '$13,945,682', '$1,205,034', '$12,698,355', '$33,185,884', '$4,152,584', '$3,478,400', '$1,927,779']
    

    Getting all the Profit in Integer of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.

    In [70]:
    profit_int3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
                profit_int3.append(int(Drama_DataFrame.Profit[i]))
    
    print(profit_int3)
    
    [583698673, 559454789, 77551594, 35552675, 163591522, 129748880, 58660270, 22004627, 156127894, 4478084, 122498338, 129590606, 78809717, 136567581, 60143987, 49309093, 217276928, 26721826, 29802928, 132552290, 167618160, 38984536, 66050951, 15059418, 188120004, 117033509, 71633833, 41540205, 4847480, 57917283, 40282881, 188265198, 2281732, 57086711, 317522294, 21028230, 36545707, 40506120, 113955898, 5601987, 44168692, 20044909, 20069303, 20909437, 11477345, 67356170, 51076141, 51603136, 21556959, 27087044, 72831866, 12971021, 23787727, 29964656, 10369708, 143806510, 36699612, 13945682, 1205034, 12698355, 33185884, 4152584, 3478400, 1927779]
    

    Getting all the Net Profit Margin of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.

    In [385]:
    npm3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
                npm3.append(int(Drama_DataFrame.Profit[i]/Drama_DataFrame.Worldwide_Gross[i]*100))
    
    print(npm3)
    
    [84, 88, 56, 39, 76, 72, 53, 30, 76, 9, 75, 76, 67, 78, 61, 57, 86, 43, 46, 80, 84, 56, 70, 36, 88, 82, 74, 62, 16, 69, 62, 90, 10, 75, 94, 55, 69, 71, 88, 27, 74, 58, 60, 63, 48, 85, 82, 83, 68, 73, 89, 59, 76, 81, 63, 96, 88, 73, 19, 83, 94, 74, 93, 91]
    

    Creating a list consisting of 'PG-13' repeated 64 times for the PG-13 rated category due to it having 64 movies for the new dataframe that will be created below.

    In [386]:
    size_3 = []
    for i in list(range(64)):
        size_3.append('PG-13')
    print(size_3)
    
    ['PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13']
    

    Converting the list consisting of Net Profit Margin of all the PG-13 rated movies from integer to percentage.

    In [387]:
    npm3_percent = []
    for i in npm3:
        npm3_percent.append("{:}%".format(i))
    print(npm3_percent)
    
    ['84%', '88%', '56%', '39%', '76%', '72%', '53%', '30%', '76%', '9%', '75%', '76%', '67%', '78%', '61%', '57%', '86%', '43%', '46%', '80%', '84%', '56%', '70%', '36%', '88%', '82%', '74%', '62%', '16%', '69%', '62%', '90%', '10%', '75%', '94%', '55%', '69%', '71%', '88%', '27%', '74%', '58%', '60%', '63%', '48%', '85%', '82%', '83%', '68%', '73%', '89%', '59%', '76%', '81%', '63%', '96%', '88%', '73%', '19%', '83%', '94%', '74%', '93%', '91%']
    

    Creating a list of consisting of 'Revenue' repeated 64 times and 'Profit' repeated 64 times for the PG-13 rated category due to it having 64 movies for the new dataframe that will be created below.

    In [388]:
    pg13_rate = []
    for i in list(range(64)):
        pg13_rate.append('Revenue')
    for i in list(range(64)):
        pg13_rate.append('Profit')
    print(pg13_rate)
    
    ['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
    

    These are the variables needed to create the columns for the fivith dataframe: 'df5'

    Getting all the Names of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.

    In [319]:
    name4= []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
                name4.append(Drama_DataFrame.Movie[i])
    print(name4)
    
    ['Shame', 'Matador', 'Whore', 'Tokyo Decadence', 'Wide Sargasso Sea', 'Kids', 'Crash', 'The Dreamers', 'Lust, Caution', 'Shame', 'Blue Is the Warmest Colour', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine', 'Two Girls and a Guy', 'Elles', 'Hell', 'Se, jie', 'The Evil Dead', 'Shame', 'Arabian Nights', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris', 'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Whore 1991', 'Law of Desire']
    

    Getting all the Worldwide Revenue in Dollars of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.

    In [390]:
    world_cur4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
                world_cur4.append(Drama_DataFrame.Worldwide_Gross_x[i])
    
    print(world_cur4)
    
    ['$20,412,841', '$17,356,268', '$1,008,404', '$277,845', '$1,614,784', '$20,412,216', '$98,410,061', '$15,121,165', '$67,091,915', '$20,412,841', '$19,465,835', '$15,307,113', '$20,412,841', '$19,465,835', '$16,566,240', '$2,315,026', '$3,822,241', '$213,120,004', '$65,167,430', '$2,661,944', '$20,412,841', '$3,453,416', '$50,283,563', '$3,894,240', '$2,038,916', '$9,000,000', '$20,412,216', '$101,173,038', '$36,147,711', '$413,802', '$65,167,430', '$5,746,453', '$1,008,404', '$1,470,809']
    

    Getting all the Worldwide Revenue in Integer of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.

    In [391]:
    world_int4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
                world_int4.append(Drama_DataFrame.Worldwide_Gross[i])            
    print(world_int4)
    
    [20412841, 17356268, 1008404, 277845, 1614784, 20412216, 98410061, 15121165, 67091915, 20412841, 19465835, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 65167430, 2661944, 20412841, 3453416, 50283563, 3894240, 2038916, 9000000, 20412216, 101173038, 36147711, 413802, 65167430, 5746453, 1008404, 1470809]
    

    Getting all the Profit in Dollars of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.

    In [392]:
    profit_cur4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
                profit_cur4.append(Drama_DataFrame.Profit_x[i])
    print(profit_cur4)
    
    ['$13,912,841', '$4,856,268', '$8,404', '$257,845', '$659,312', '$18,912,216', '$89,410,061', '$121,165', '$52,091,915', '$13,912,841', '$15,465,835', '$307,113', '$13,912,841', '$15,390,895', '$15,566,240', '$1,315,026', '$256,669', '$201,120,004', '$50,167,430', '$2,311,944', '$13,912,841', '$2,548,651', '$16,283,563', '$3,664,240', '$1,038,916', '$8,000,000', '$18,912,216', '$94,673,038', '$34,897,711', '$401,802', '$50,167,430', '$3,546,453', '$958,404', '$858,737']
    

    Getting all the Profit in Integer of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.

    In [69]:
    profit_int4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
                profit_int4.append(int(Drama_DataFrame.Profit[i]))
    print(profit_int4)
    
    [13912841, 4856268, 8404, 257845, 659312, 18912216, 89410061, 121165, 52091915, 13912841, 15465835, 307113, 13912841, 15390895, 15566240, 1315026, 256669, 201120004, 50167430, 2311944, 13912841, 2548651, 16283563, 3664240, 1038916, 8000000, 18912216, 94673038, 34897711, 401802, 50167430, 3546453, 958404, 858737]
    

    Getting all the Net Profit Margin of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.

    In [394]:
    npm4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
                npm4.append(int(Drama_DataFrame.Profit[i]/Drama_DataFrame.Worldwide_Gross[i]*100))
    print(npm4)
    
    [68, 27, 0, 92, 40, 92, 90, 0, 77, 68, 79, 2, 68, 79, 93, 56, 6, 94, 76, 86, 68, 73, 32, 94, 50, 88, 92, 93, 96, 97, 76, 61, 95, 58]
    

    Creating a list consisting of 'NC-17' repeated 34 times for the NC-17 rated category due to it having 34 movies for the new dataframe that will be created below.

    In [395]:
    size_4 = []
    for i in list(range(34)):
        size_4.append('NC-17')
    print(size_4)
    
    ['NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17']
    

    Converting the list consisting of Net Profit Margin of all the NC-17 rated movies from integer to percentage.

    In [396]:
    npm4_percent = []
    for i in npm4:
        npm4_percent.append("{:}%".format(i))
    print(npm4_percent)
    
    ['68%', '27%', '0%', '92%', '40%', '92%', '90%', '0%', '77%', '68%', '79%', '2%', '68%', '79%', '93%', '56%', '6%', '94%', '76%', '86%', '68%', '73%', '32%', '94%', '50%', '88%', '92%', '93%', '96%', '97%', '76%', '61%', '95%', '58%']
    

    Creating a list of consisting of 'Revenue' repeated 34 times and 'Profit' repeated 34 times for the NC-17 rated category due to it having 34 movies for the new dataframe that will be created below.

    In [397]:
    nc17_rate = []
    for i in list(range(34)):
        nc17_rate.append('Revenue')
    for i in list(range(34)):
        nc17_rate.append('Profit')
    print(nc17_rate)
    
    ['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
    

    Creating dataframes df1, df2, df3 and df4.

    In [398]:
    df5 = pd.DataFrame({'Name':name4,'Revenue':world_cur4, "Profit":profit_cur4,
                        'Int Revenue':world_int4, "Int Profit":profit_int4,'System Ratings':size_4,
                        'Net Profit Margin %':npm4_percent})
    df4 = pd.DataFrame({'Name':name3,'Revenue':world_cur3, "Profit":profit_cur3,
                        'Int Revenue':world_int3, "Int Profit":profit_int3,'System Ratings':size_3,
                        'Net Profit Margin %':npm3_percent})
    df3 = pd.DataFrame({'Name':name2,'Revenue':world_cur2, "Profit":profit_cur2,
                        'Int Revenue':world_int2, "Int Profit":profit_int2,'System Ratings':size_2,
                        'Net Profit Margin %':npm2_percent})
    df2 = pd.DataFrame({'Name':name1,'Revenue':world_cur1, "Profit":profit_cur1,
                        'Int Revenue':world_int1, "Int Profit":profit_int1,'System Ratings':size_1,
                        'Net Profit Margin %':npm1_percent})
    df1 = pd.DataFrame({'Name':name,'Revenue':world_cur, "Profit":profit_cur,
                        'Int Revenue':world_int, "Int Profit":profit_int,'System Ratings':size,
                        'Net Profit Margin %':npm_percent})
    

    The 'df1' dataframe. (this dataframe is interactive)

    In [442]:
    df1
    
    Out[442]:
    Name Revenue Profit Int Revenue Int Profit System Ratings Net Profit Margin %
    Loading... (need help?)

    Sorting the npm_sort list which consist of the list of the R-rated movies Net Profit Margin in accending order

    In [454]:
    npm_sort = []
    for i in npm:npm_sort.append(i)
    npm_sort.sort();print(npm_sort)
    
    [0, 4, 4, 11, 13, 17, 18, 25, 26, 27, 28, 29, 30, 34, 39, 42, 44, 56, 57, 60, 64, 66, 66, 68, 68, 70, 71, 72, 72, 77, 80, 80, 80, 81, 82, 83, 83, 85, 85, 85, 87, 87, 87, 88, 89, 89, 89, 90, 91, 91, 92, 93, 93, 94, 96, 96]
    

    Getting the index of the sorted npm_sort list.

    In [455]:
    index_npm = []
    for x,i in enumerate(npm_sort):
        if i in npm:index_npm.append(npm.index(i))
    print(index_npm)
    
    [21, 30, 30, 54, 46, 53, 55, 8, 5, 12, 2, 15, 11, 27, 19, 51, 13, 25, 14, 6, 18, 17, 17, 41, 41, 10, 28, 22, 22, 0, 9, 9, 9, 45, 38, 1, 1, 3, 3, 3, 35, 35, 35, 50, 26, 26, 26, 42, 32, 32, 7, 31, 31, 52, 16, 16]
    

    Re-arranging the dataframe df1 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_r1.

    In [456]:
    df_r1 = df1.iloc[index_npm]
    

    Re-setting the index of the df_r1 datafrae.

    In [457]:
    df_r1 = df_r1.reset_index()
    

    Deleting unwanted columns from the newly created dataframe df_r1 .

    In [458]:
    del df_r1['index']
    del df_r1['Int Revenue']
    del df_r1['Int Profit']
    del df_r1['System Ratings']
    

    Deleting rows that are duplicated and resetting the index in the dataframe df_r1 .

    In [459]:
    df_r1 =  df_r1.drop([2, 22, 24, 30, 36, 37, 40, 38, 41, 44, 45, 48, 51, 54, 28, 31])
    df_r1 = df_r1.reset_index()
    

    Deleting the colunm index due to it showing up when resetting the index in a datframe.

    In [460]:
    del df_r1['index']
    

    The first ten rows in the df_r1 dataframe.

    In [261]:
    df_r1.head(10)
    
    Out[261]:
    Name Revenue Profit Net Profit Margin %
    0 Stoker $12,034,913 $34,913 0%
    1 Take Shelter $4,972,016 $222,016 4%
    2 Take Shelter $4,972,016 $222,016 4%
    3 Rich and Famous $13,000,000 $1,500,000 11%
    4 Palo Alto $1,156,309 $156,309 13%
    5 Zoot Suit $3,256,082 $556,082 17%
    6 Raggedy Man $11,000,000 $2,000,000 18%
    7 The Master $50,647,416 $13,147,416 25%
    8 Crimson Peak $74,966,854 $19,966,854 26%
    9 The Water Diviner $31,054,727 $8,554,727 27%

    Turning the Net Profit Margin from integer to string whith a percentage sybmol, then putting them into five categories going from 0-20, 20-40, 40-60, 60-80 and 80-100.

    In [263]:
    cat_npm = []
    for i in npm:
        if 0 <= i < 20: cat_npm.append('0% - 20%')
        if 20 <= i < 40: cat_npm.append('20% - 40%')
        if 40 <= i < 60: cat_npm.append('40% - 60%')
        if 60 <= i < 80: cat_npm.append('60% - 80%')
        if 80 <= i < 100: cat_npm.append('80% - 100%')
    print(cat_npm)
    
    ['60% - 80%', '80% - 100%', '20% - 40%', '80% - 100%', '80% - 100%', '20% - 40%', '60% - 80%', '80% - 100%', '20% - 40%', '80% - 100%', '60% - 80%', '20% - 40%', '20% - 40%', '40% - 60%', '40% - 60%', '20% - 40%', '80% - 100%', '60% - 80%', '60% - 80%', '20% - 40%', '60% - 80%', '0% - 20%', '60% - 80%', '80% - 100%', '80% - 100%', '40% - 60%', '80% - 100%', '20% - 40%', '60% - 80%', '60% - 80%', '0% - 20%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '0% - 20%', '60% - 80%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '0% - 20%', '60% - 80%', '80% - 100%', '80% - 100%', '80% - 100%', '40% - 60%', '80% - 100%', '0% - 20%', '0% - 20%', '0% - 20%']
    

    Using Counter to see how many is in each category. There are 7 R-rated movies that have a Net Profit Margin between '0% - 20%'. There are 8 R-rated movies that have a Net Profit Margin between '20% - 40%'. There are 4 R-rated movies that have a Net Profit Margin between '40% - 60%'. There are 11 R-rated movies that have a Net Profit Margin between '60% - 80%'. There are 26 R-rated movies that have a Net Profit Margin between '80% - 100%'.

    In [58]:
    Counter(cat_npm)
    
    Out[58]:
    Counter({'60% - 80%': 11,
             '80% - 100%': 26,
             '20% - 40%': 8,
             '40% - 60%': 4,
             '0% - 20%': 7})

    Styling df_r1 dataframe using the a function and the indexes to do so.

    In [461]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
        
        df.iloc[6,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[12,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        df.iloc[13,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
        
        df.iloc[14,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
        df.iloc[15,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
        df.iloc[16,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
        df.iloc[17,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
        
        df.iloc[18,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        df.iloc[19,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        
        
        #df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
        return df 
    df_r2 = df_r1[:20].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_r2 dataframe to the df_r2.png file as an image to be used for the analysis later on.

    In [462]:
    dfi.export(df_r2, 'df_r2.png')
    

    The df_r2 dataframe.

    Styling df_r3 dataframe using the a function and the indexes to do so.

    In [463]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
        
        df.iloc[6,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[12,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[13,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[14,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[15,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[16,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[17,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[18,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        df.iloc[19,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
        
        #df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
        return df 
    df_r3 = df_r1[20:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_r3 dataframe to the df_r3.png file as an image to be used for the analysis later on.

    In [464]:
    dfi.export(df_r3, 'df_r3.png')
    

    The df_r3 dataframe.

    The 'df2' dataframe. (this dataframe is interactive)

    In [443]:
    df2
    
    Out[443]:
    Name Revenue Profit Int Revenue Int Profit System Ratings Net Profit Margin %
    Loading... (need help?)

    Sorting the npm_sort1 list which consist of the list of the PG-rated movies Net Profit Margin in accending order

    In [440]:
    npm_sort1 = []
    for i in npm1:npm_sort1.append(i)
    npm_sort1.sort();print(npm_sort1)
    
    [0, 17, 22, 27, 34, 41, 46, 47, 47, 49, 50, 53, 57, 60, 61, 65, 66, 70, 71, 72, 75, 76, 77, 78, 78, 78, 78, 81, 82, 84, 86, 86, 86, 87, 87, 88, 89, 93, 93, 95, 95, 95, 96, 98, 98, 99]
    

    Getting the index of the sorted npm_sort1 list.

    In [441]:
    index_npm1 = []
    for x,i in enumerate(npm_sort1):
        if i in npm1:index_npm1.append(npm1.index(i))
    print(index_npm1)
    
    [0, 16, 15, 27, 31, 45, 43, 6, 6, 39, 29, 20, 18, 36, 1, 17, 42, 38, 34, 33, 14, 30, 22, 3, 3, 3, 3, 11, 8, 32, 5, 5, 5, 19, 19, 28, 44, 2, 2, 4, 4, 4, 23, 13, 13, 41]
    

    Re-arranging the dataframe df2 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_pg.

    In [442]:
    df_pg = df2.iloc[index_npm1]
    

    Re-setting the index of the df_pg datafrae.

    In [443]:
    df_pg = df_pg.reset_index()
    

    Deleting unwanted columns from the newly created dataframe df_pg .

    In [444]:
    del df_pg['index']
    del df_pg['Int Revenue']
    del df_pg['Int Profit']
    del df_pg['System Ratings']
    

    Deleting rows that are duplicated and resetting the index in the dataframe df_pg .

    In [445]:
    df_pg =  df_pg.drop([7, 22, 23, 24, 25, 30, 31, 33, 37, 39, 40, 43, 45])
    df_pg = df_pg.reset_index()
    

    Deleting the colunm index due to it showing up when resetting the index in a datframe.

    In [446]:
    del df_pg['index']
    

    Changing the name of a movie and making it shorter to fit the datframe.

    In [447]:
    df_pg.Name[8] = 'The Night the Lights...'#The Night the Lights \n Went Out in Georgia'
    

    The first ten rows in the df_pg dataframe.

    In [202]:
    df_pg.head(10)
    
    Out[202]:
    Name Revenue Profit Net Profit Margin %
    0 Hugo $180,047,784 $47,784 0%
    1 Dreamer $38,741,732 $6,741,732 17%
    2 Tuck Everlasting $19,344,615 $4,344,615 22%
    3 The Spanish Prisoner $13,835,130 $3,835,130 27%
    4 Pure Country $15,164,458 $5,164,458 34%
    5 The Natural $48,000,000 $20,000,000 41%
    6 Tender Mercies $8,443,124 $3,943,124 46%
    7 Somewhere in Time $9,709,597 $4,609,597 47%
    8 The Night the Lights... $14,923,752 $7,423,752 49%
    9 The Secret of Roan Inish $6,101,815 $3,101,815 50%

    Turning the Net Profit Margin from integer to string whith a percentage sybmol, then putting them into five categories going from 0-20, 20-40, 40-60, 60-80 and 80-100.

    In [448]:
    cat_npm1 = []
    for i in npm1:
        if 0 <= i < 20: cat_npm1.append('0% - 20%')
        if 20 <= i < 40: cat_npm1.append('20% - 40%')
        if 40 <= i < 60: cat_npm1.append('40% - 60%')
        if 60 <= i < 80: cat_npm1.append('60% - 80%')
        if 80 <= i < 100: cat_npm1.append('80% - 100%')
    

    Using Counter to see how many is in each category. There are 2 PG-rated movies that have a Net Profit Margin between '0% - 20%'. There are 3 PG-rated movies that have a Net Profit Margin between '20% - 40%'. There are 8 PG-rated movies that have a Net Profit Margin between '40% - 60%'. There are 14 PG-rated movies that have a Net Profit Margin between '60% - 80%'. There are 19 PG-rated movies that have a Net Profit Margin between '80% - 100%'.

    In [155]:
    Counter(cat_npm1)
    
    Out[155]:
    Counter({'0% - 20%': 2,
             '60% - 80%': 14,
             '80% - 100%': 19,
             '40% - 60%': 8,
             '20% - 40%': 3})

    Styling df_pg1 dataframe using the a function and the indexes to do so.

    In [449]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#F8C9B4;color:black;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#F8C9B4;color:black;border-bottom: 2px solid black'
        
        df.iloc[2,:] = 'background-color:#F5966B;color:black;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#F5966B;color:black;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#F5966B;color:black;border-bottom: 2px solid black'
        
        df.iloc[5,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
        
        df.iloc[12,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[13,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[14,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[15,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        
        
        #df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
        return df 
    df_pg1 = df_pg[:16].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_pg1 dataframe to the df_pg1.png file as an image to be used for the analysis later on.

    In [450]:
    dfi.export(df_pg1, 'df_pg1.png')
    

    The df_pg1 dataframe.

    Styling df_pg2 dataframe using the a function and the indexes to do so.

    In [451]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
        
        df.iloc[6,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[12,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[13,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[14,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[15,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        df.iloc[16,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
        
        #df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
        return df 
    df_pg2 = df_pg[16:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_pg2 dataframe to the df_pg2.png file as an image to be used for the analysis later on.

    In [452]:
    dfi.export(df_pg2, 'df_pg2.png')
    

    The df_pg2 dataframe.

    The 'df3' dataframe. (this dataframe is interactive)

    In [444]:
    df3
    
    Out[444]:
    Name Revenue Profit Int Revenue Int Profit System Ratings Net Profit Margin %
    Loading... (need help?)

    Sorting the npm_sort2 list which consist of the list of the G-rated movies Net Profit Margin in accending order

    In [429]:
    npm_sort2 = []
    for i in npm2:npm_sort2.append(i)
    npm_sort2.sort();print(npm_sort2)
    
    [33, 33, 40, 43, 45, 62, 65, 70, 72, 72, 76, 76, 78, 78, 79, 80, 85, 85, 87, 92, 94, 95, 95, 97, 99]
    

    Getting the index of the sorted npm_sort2 list.

    In [430]:
    index_npm2 = []
    for x,i in enumerate(npm_sort2):
        if i in npm2:index_npm2.append(npm2.index(i))
    print(index_npm2)
    
    [9, 9, 11, 12, 5, 1, 4, 0, 2, 2, 19, 19, 7, 7, 20, 22, 10, 10, 8, 6, 14, 3, 3, 15, 18]
    

    Re-arranging the dataframe df3 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_g.

    In [431]:
    df_g = df3.iloc[index_npm2]
    

    Re-setting the index of the df_g datafrae.

    In [432]:
    df_g = df_g.reset_index()
    

    Deleting unwanted columns from the newly created dataframe df_g .

    In [433]:
    del df_g['index']
    del df_g['Int Revenue']
    del df_g['Int Profit']
    del df_g['System Ratings']
    

    Deleting rows that are duplicated and resetting the index in the dataframe df_g .

    In [434]:
    df_g =  df_g.drop([1, 9, 11, 13, 17, 22])
    df_g = df_g.reset_index()
    

    Deleting the colunm index due to it showing up when resetting the index in a datframe.

    In [435]:
    del df_g['index']
    

    The first ten rows in the df_g dataframe.

    In [280]:
    df_g.head(10)
    
    Out[280]:
    Name Revenue Profit Net Profit Margin %
    0 Pollyanna $3,750,000 $1,250,000 33%
    1 Charlotte's Web $143,985,708 $58,985,708 40%
    2 Kit Kittredge: An American Girl $17,657,973 $7,657,973 43%
    3 Ramona and Beezus $27,469,621 $12,469,621 45%
    4 Prancer $18,587,135 $11,587,135 62%
    5 The Little Rascals $66,947,950 $43,947,950 65%
    6 A Sunday in the Country $2,411,143 $1,711,143 70%
    7 The Rookie $80,693,537 $58,693,537 72%
    8 My Fair Lady 1964 $72,071,636 $55,071,636 76%
    9 The Hunchback of Notre Drame $325,500,000 $255,500,000 78%

    Styling df_g1 dataframe using the a function and the indexes to do so.

    In [436]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#F1B5B4 ;color:black;border-bottom: 2px solid black'
        
        df.iloc[1,:] = 'background-color:#ff6961;color:black;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#ff6961;color:black;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#ff6961;color:black;border-bottom: 2px solid black'
        
        df.iloc[4,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        return df 
    df_g1 = df_g[:9].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_g1 dataframe to the df_g1.png file as an image to be used for the analysis later on.

    In [437]:
    dfi.export(df_g1, 'df_g1.png')
    

    The df_g1 dataframe.

    Styling df_g2 dataframe using the a function and the indexes to do so.

    In [438]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
        
        df.iloc[2,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
        return df 
    df_g2 = df_g[9:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_g2 dataframe to the df_g2.png file as an image to be used for the analysis later on.

    In [439]:
    dfi.export(df_g2, 'df_g2.png')
    

    The df_g2 dataframe.

    The 'df4' dataframe. (this dataframe is interactive)

    In [445]:
    df4
    
    Out[445]:
    Name Revenue Profit Int Revenue Int Profit System Ratings Net Profit Margin %
    Loading... (need help?)

    Sorting the npm_sort3 list which consist of the list of the PG-13 rated movies Net Profit Margin in accending order

    In [406]:
    npm_sort3 = []
    for i in npm3:npm_sort3.append(i)
    npm_sort3.sort();print(npm_sort3)
    
    [9, 10, 16, 19, 27, 30, 36, 39, 43, 46, 48, 53, 55, 56, 56, 57, 58, 59, 60, 61, 62, 62, 63, 63, 67, 68, 69, 69, 70, 71, 72, 73, 73, 74, 74, 74, 75, 75, 76, 76, 76, 76, 78, 80, 81, 82, 82, 83, 83, 84, 84, 85, 86, 88, 88, 88, 88, 89, 90, 91, 93, 94, 94, 96]
    

    Getting the index of the sorted npm_sort3 list.

    In [407]:
    index_npm3 = []
    for x,i in enumerate(npm_sort3):
        if i in npm3:index_npm3.append(npm3.index(i))
    print(index_npm3)
    
    [9, 32, 28, 58, 39, 7, 23, 3, 17, 18, 44, 6, 35, 2, 2, 15, 41, 51, 42, 14, 27, 27, 43, 43, 12, 48, 29, 29, 22, 37, 5, 49, 49, 26, 26, 26, 10, 10, 4, 4, 4, 4, 13, 19, 53, 25, 25, 47, 47, 0, 0, 45, 16, 1, 1, 1, 1, 50, 31, 63, 62, 34, 34, 55]
    

    Re-arranging the dataframe df4 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_pg13.

    In [408]:
    df_pg13 = df4.iloc[index_npm3]
    

    Re-setting the index of the df_pg13 datafrae.

    In [409]:
    df_pg13 = df_pg13.reset_index()
    

    Deleting unwanted columns from the newly created dataframe df_pg13 .

    In [410]:
    del df_pg13['index']
    del df_pg13['Int Revenue']
    del df_pg13['Int Profit']
    del df_pg13['System Ratings']
    

    Deleting rows that are duplicated and resetting the index in the dataframe df_pg13 .

    In [411]:
    df_pg13 =  df_pg13.drop([14, 20, 22, 26, 31, 33, 34, 36, 38, 39, 40, 45, 48, 53, 54, 55, 61])
    df_pg13 = df_pg13.reset_index()
    

    Deleting the colunm index due to it showing up when resetting the index in a datframe.

    In [412]:
    del df_pg13['index']
    

    The first ten rows in the df_pg13 dataframe.

    In [287]:
    df_pg13.head(10)
    
    Out[287]:
    Name Revenue Profit Net Profit Margin %
    0 Charlie St. Cloud $48,478,084 $4,478,084 9%
    1 The Light Between Oceans $22,281,732 $2,281,732 10%
    2 Draft Day $29,847,480 $4,847,480 16%
    3 Rabbit Hole $6,205,034 $1,205,034 19%
    4 Country Strong $20,601,987 $5,601,987 27%
    5 Anna Karenina $71,004,627 $22,004,627 30%
    6 The Best of Me $41,059,418 $15,059,418 36%
    7 Burlesque $90,552,675 $35,552,675 39%
    8 The Tree of Life $61,721,826 $26,721,826 43%
    9 The Longest Ride $63,802,928 $29,802,928 46%

    Styling df_pg131 dataframe using the a function and the indexes to do so.

    In [413]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
        
        df.iloc[4,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
        
        df.iloc[8,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[12,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[13,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[14,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[15,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        df.iloc[16,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
        
        df.iloc[17,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[18,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[19,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[20,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[21,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[22,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[23,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        #df.iloc[24,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        #df.iloc[25,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        return df 
    df_pg131 = df_pg13[:24].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_pg131 dataframe to the df_pg131.png file as an image to be used for the analysis later on.

    In [414]:
    dfi.export(df_pg131, 'df_pg131.png')
    

    The df_pg131 dataframe.

    Styling df_pg132 dataframe using the a function and the indexes to do so.

    In [415]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[3,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
        
        df.iloc[8,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[12,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[13,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[14,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[15,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[16,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[17,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[18,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[19,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[20,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[21,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        df.iloc[22,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
        return df 
    df_pg132 = df_pg13[24:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_pg132 dataframe to the df_pg132.png file as an image to be used for the analysis later on.

    In [416]:
    dfi.export(df_pg132, 'df_pg132.png')
    

    The df_pg132 dataframe.

    The 'df5' dataframe. (this dataframe is interactive)

    In [446]:
    df5
    
    Out[446]:
    Name Revenue Profit Int Revenue Int Profit System Ratings Net Profit Margin %
    Loading... (need help?)

    Sorting the npm_sort4 list which consist of the list of the NC-17 rated movies Net Profit Margin in accending order

    In [417]:
    npm_sort4 = []
    for i in npm4:npm_sort4.append(i)
    npm_sort4.sort();print(npm_sort4)
    
    [0, 0, 2, 6, 27, 32, 40, 50, 56, 58, 61, 68, 68, 68, 68, 73, 76, 76, 77, 79, 79, 86, 88, 90, 92, 92, 92, 93, 93, 94, 94, 95, 96, 97]
    

    Getting the index of the sorted npm_sort4 list.

    In [418]:
    index_npm4 = []
    for x,i in enumerate(npm_sort4):
        if i in npm4:index_npm4.append(npm4.index(i))
    print(index_npm4)    
    
    [2, 2, 11, 16, 1, 22, 4, 24, 15, 33, 31, 0, 0, 0, 0, 21, 18, 18, 8, 10, 10, 19, 25, 6, 3, 3, 3, 14, 14, 17, 17, 32, 28, 29]
    

    Re-arranging the dataframe df5 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_nc17.

    In [419]:
    df_nc17 = df5.iloc[index_npm4]
    

    Re-setting the index of the df_nc17 datafrae.

    In [420]:
    df_nc17 = df_nc17.reset_index()
    

    Deleting unwanted columns from the newly created dataframe df_nc17 .

    In [421]:
    del df_nc17['index']
    del df_nc17['Int Revenue']
    del df_nc17['Int Profit']
    del df_nc17['System Ratings']
    

    Deleting rows that are duplicated and resetting the index in the dataframe df_nc17 .

    In [422]:
    df_nc17 =  df_nc17.drop([0, 11, 12, 13, 16, 19, 24, 25, 27, 29])
    df_nc17 = df_nc17.reset_index()
    

    Deleting the colunm index due to it showing up when resetting the index in a datframe.

    In [423]:
    del df_nc17['index']
    

    The first ten rows in the df_nc17 dataframe.

    In [304]:
    df_nc17.head(10)
    
    Out[304]:
    Name Revenue Profit Net Profit Margin %
    0 Whore $1,008,404 $8,404 0%
    1 Whore $1,008,404 $8,404 0%
    2 The Dreamers $15,307,113 $307,113 2%
    3 Elles $3,822,241 $256,669 6%
    4 Matador $17,356,268 $4,856,268 27%
    5 Natural Born Killers $50,283,563 $16,283,563 32%
    6 Wide Sargasso Sea $1,614,784 $659,312 40%
    7 Bad Lieutenant $2,038,916 $1,038,916 50%
    8 Two Girls and a Guy $2,315,026 $1,315,026 56%
    9 Law of Desire $1,470,809 $858,737 58%

    Styling df_nc171 dataframe using the a function and the indexes to do so.

    In [424]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#FCE6F2;color:black;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#FCE6F2;color:black;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#FCE6F2;color:black;border-bottom: 2px solid black'
        
        df.iloc[3,:] = 'background-color:#DB7093;color:white;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#DB7093;color:white;border-bottom: 2px solid black'
        
        df.iloc[5,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
        
        df.iloc[9,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
        
        return df 
    df_nc171 = df_nc17[:12].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_nc171 dataframe to the df_nc171.png file as an image to be used for the analysis later on.

    In [425]:
    dfi.export(df_nc171, 'df_nc171.png')
    

    The df_nc171 dataframe.

    Styling df_nc172 dataframe using the a function and the indexes to do so.

    In [426]:
    def highlight_cells13(x):
        df = x.copy()
        df.loc[:,:] = '' 
        df.iloc[0,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
        df.iloc[1,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
        df.iloc[2,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
        
        df.iloc[3,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[4,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[5,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[6,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[7,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[8,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[9,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[10,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        df.iloc[11,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
        
        return df 
    df_nc172 = df_nc17[12:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
                {"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
                #{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
                {'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
                 ])\
                .apply(highlight_cells13, axis=None)
    

    Saving the df_nc172 dataframe to the df_nc172.png file as an image to be used for the analysis later on.

    In [427]:
    dfi.export(df_nc172, 'df_nc172.png')
    

    The df_nc172 dataframe.

    This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each R-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below. (the graph below is interactive, you can hover over the pie chart)

    In [447]:
    %%html
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    
    <span class="highcharts-figure">
        <div id="ruth"></div>
        <p class="highcharts-description">
        </p>
    </span>
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each R-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .

    In [448]:
    %%js
    Highcharts.chart('ruth', {
        chart: {
            width:650,
            height:450,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: '<span style="color:#C20404">Net Profit Margin ot R-rated Drama Movies </span>'
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
       tooltip: {
            pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'Net Profit Margin',
            colorByPoint: true,
            colors: ['#EFB8B8','#E66A6A','#FF0000','#C20404', '#690000'],
            data: [{
                name: '0%-20%',
                y: 6,
            }, {
                name: '20%-40%',
                y: 8,
            },{
                name: '40%-60%',
                y: 4,
            },{
                name: '60%-80%',
                y: 8,
            }, {
                name: '80%-100%',
                y: 14,
                sliced: true,
                selected: true
            }]
        }]
    });
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below. (the graph below is interactive, you can hover over the pie chart)

    In [449]:
    %%html
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    
    <span class="highcharts-figure">
        <div id="ruth1"></div>
        <p class="highcharts-description">
        </p>
    </span>
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .

    In [450]:
    %%js
    Highcharts.chart('ruth1', {
        chart: {
            width:650,
            height:450,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: '<span style="color:#FF5000">Net Profit Margin ot PG-rated Drama Movies </span>'
        },
       tooltip: {
            pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'Net Profit Margin',
            colorByPoint: true,
            colors: ['#F8C9B4','#F5966B','#FF5000','#C33F03',
                     '#8C2E02'],
            data: [{
                name: '0%-20%',
                y: 2,
            }, {
                name: '20%-40%',
                y: 3,
            },{
                name: '40%-60%',
                y: 7,
            },{
                name: '60%-80%',
                y: 10,
            }, {
                name: '80%-100%',
                y: 11,
                sliced: true,
                selected: true
            }]
        }]
    });
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each G-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTMl below. (the graph below is interactive, you can hover over the pie chart)

    In [451]:
    %%html
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    
    <span class="highcharts-figure">
        <div id="ruth2"></div>
        <p class="highcharts-description">
        </p>
    </span>
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each G-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .

    In [452]:
    %%js
    Highcharts.chart('ruth2', {
        chart: {
            width:650,
            height:450,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: '<span style="color:#ef3038">Net Profit Margin ot G-rated Drama Movies </span>'
        },
       tooltip: {
            pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'Net Profit Margin',
            colorByPoint: true,
            colors: ['#F1B5B4 ','#ff6961','#ef3038','#cc6666'],
            data: [{
                name: '20%-40%',
                y: 1,
            },{
                name: '40%-60%',
                y: 3,
            },{
                name: '60%-80%',
                y: 7,
            }, {
                name: '80%-100%',
                y: 8,
                sliced: true,
                selected: true
            }]
        }]
    });
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-13 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below . (the graph below is interactive, you can hover over the pie chart)

    In [453]:
    %%html
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    
    <span class="highcharts-figure">
        <div id="ruth3"></div>
        <p class="highcharts-description">
        </p>
    </span>
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-13 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .

    In [454]:
    %%js
    Highcharts.chart('ruth3', {
        chart: {
            width:650,
            height:450,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: '<span style="color:#8B0000">Net Profit Margin ot PG-13 Rated Drama Movies </span>'
        },
       tooltip: {
            pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'Net Profit Margin',
            colorByPoint: true,
            colors: ['#E97451','#CD5C5C','#B22222','#C04000',
                     '#8B0000'],
            data: [{
                name: '0%-20%',
                y: 4,
            }, {
                name: '20%-40%',
                y: 4,
            },{
                name: '40%-60%',
                y: 9,
            },{
                name: '60%-80%',
                y: 15,
            }, {
                name: '80%-100%',
                y: 15,
                sliced: true,
                selected: true
            }]
        }]
    });
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each NC-17 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below. (the graph below is interactive, you can hover over the pie chart)

    In [455]:
    %%html
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    
    <span class="highcharts-figure">
        <div id="ruth4"></div>
        <p class="highcharts-description">
        </p>
    </span>
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each NC-17 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .

    In [456]:
    %%js
    Highcharts.chart('ruth4', {
        chart: {
            width:650,
            height:450,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: '<span style="color:#E0115F">Net Profit Margin ot NC-17 Rated Drama Movies </span>'
        },
       tooltip: {
            pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.1f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'Net Profit Margin',
            colorByPoint: true,
            colors: ['#FCE6F2','#DB7093','#E0115F','#953553',
                     '#702963'],
            data: [{
                name: '0%-20%',
                y: 3,
            }, {
                name: '20%-40%',
                y: 2,
            },{
                name: '40%-60%',
                y: 4,
            },{
                name: '60%-80%',
                y: 6,
            }, {
                name: '80%-100%',
                y: 9,
                sliced: true,
                selected: true
            }]
        }]
    });
    

    2. Conclusion: Net Profit Margin

    A Dataframe on the Net Profit Margin of R-rated Drama Movies

    A Dataframe on the Net Profit Margin of PG-rated Drama Movies

    A Dataframe on the Net Profit Margin of G-rated Drama Movies

    A Dataframe on the Net Profit Margin of PG-13 rated Drama Movies

    A Dataframe on the Net Profit Margin of NC-17 rated Drama Movies

    Blueprint: Budget and Revenue of Movies¶

    This js the blueprint for creating the third visualization Budget and Revenue of Movies, altair will be used to create this graph.

    Blueprint:

    • The format of the dataframe needed for this graph is straight forward, based on the ideology of the persception of the chart, these are the colunms needed for thr dataframe;

      • Name,
      • System Ratings,
      • Budget,
      • Revenue,
      • Budget of Movie,
      • Renenue of Movies
    • The style of this chart is a Selection Histogram which is found in the Altairs Gallery. It is a scatter plot with a x-axis and a y-axis. The x-axis is the Revenue and the y-axis is the Budget, this set up projects wether a linear regression, meaning if the hypothesis which is, the higher the budget the higher the revenue is proven right projecting a linear regression.

    • The graph has an attachted histogram showing the amount of items in each category within the selection. In order to make a selection create a box by dragging the mouse. When the mouse hovers the pionts it projects the name, system rating, budget of the movies and the reevenue of the movies.

    The is the 'Drama_DataFrame' dataframe. (this dataframe is interactive)

    In [457]:
    Drama_DataFrame
    
    Out[457]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    Loading... (need help?)

    Getting the opening weekend of movies that are R-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.

    In [18]:
    r_opening_weekend = [30122888, 37513109, 14953664, 46607250, 38560195, 13143310, 24400000, 85171450, 736311,
                        24900566, 10470145, 492648, 1220335, 19497324, 9700000, 5100000, 1443809, 237264, 118298,
                        224476, 2002165, 160547, 253510, 47122, 13575172, 257174, 256498, 24587, 7485546, 473882, 
                        52041, 387618, 8800230, 561906, 135388, 246914, 6661234, 84797, 156833, 1767308, 81006, 
                        18623, 100268, 3762145, 193728, 137651, 63461, 36134, 104030, 170335, 118150, 13307125, 
                        2105729,63356, 2337594, 287081]
    print(r_opening_weekend)#showing the r_opening_weekend list
    
    [30122888, 37513109, 14953664, 46607250, 38560195, 13143310, 24400000, 85171450, 736311, 24900566, 10470145, 492648, 1220335, 19497324, 9700000, 5100000, 1443809, 237264, 118298, 224476, 2002165, 160547, 253510, 47122, 13575172, 257174, 256498, 24587, 7485546, 473882, 52041, 387618, 8800230, 561906, 135388, 246914, 6661234, 84797, 156833, 1767308, 81006, 18623, 100268, 3762145, 193728, 137651, 63461, 36134, 104030, 170335, 118150, 13307125, 2105729, 63356, 2337594, 287081]
    

    Getting the opening weekend of movies that are PG-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.

    In [19]:
    pg_opening_weekend = [11364505, 19152401, 27547866, 16007426, 11351389, 44542, 1203011, 0, 67877361, 11351389,
                         27547866, 16755310, 8146533,  0, 12177488, 5268764, 9178233, 13616196, 6011585, 22564512, 
                         9421369, 6836036, 16007426, 9244641, 14466, 24517121, 20584908, 124011, 721341, 82601, 
                         1528982,2739680, 5609875, 298277, 2189966, 89054, 93005, 89213, 0, 2534729, 16015408, 
                         518795, 12146143, 46977, 8556935, 5088381]
    print(pg_opening_weekend)#showing the pg_opening_weekend list
    
    [11364505, 19152401, 27547866, 16007426, 11351389, 44542, 1203011, 0, 67877361, 11351389, 27547866, 16755310, 8146533, 0, 12177488, 5268764, 9178233, 13616196, 6011585, 22564512, 9421369, 6836036, 16007426, 9244641, 14466, 24517121, 20584908, 124011, 721341, 82601, 1528982, 2739680, 5609875, 298277, 2189966, 89054, 93005, 89213, 0, 2534729, 16015408, 518795, 12146143, 46977, 8556935, 5088381]
    

    Getting the opening weekend of movies that are G-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.

    In [20]:
    g_opening_weekend = [0, 2914486, 16021684, 162146, 10028065, 7810481, 0, 21037414, 8742545, 0, 679185, 11457353
                         , 220297, 16021684, 4625583, 0, 10103675, 1586753, 0, 0, 0, 0, 0, 0, 0]
    print(g_opening_weekend)#showing the g_opening_weekend list
    
    [0, 2914486, 16021684, 162146, 10028065, 7810481, 0, 21037414, 8742545, 0, 679185, 11457353, 220297, 16021684, 4625583, 0, 10103675, 1586753, 0, 0, 0, 0, 0, 0, 0]
    

    Getting the opening weekend of movies that are PG-13 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.

    In [21]:
    pg13_opening_weekend = [55785112, 35258, 22403596, 11947744, 35574710, 526011, 220522, 320690, 24074047,
                           12381585, 15371203, 143818, 16842353, 29632823, 14789393, 7102085, 24830443, 372920, 
                           13019686, 11731703, 41202458, 13203458, 21401594, 10003827, 26044590, 30468614,
                           22618358, 12305016, 9783603, 13002632, 129462, 18723269, 4765838, 105005, 9851102, 
                           15002635, 8089139, 20874072, 30452, 30452, 5079566, 76244, 228359, 8310232, 5467084,
                           187281, 15679190, 11727390, 2215891, 68266, 14065500, 6213362, 13501349, 446380, 
                           4750894, 21688103, 212000, 4690214, 53778, 55438, 9112839, 20321, 128140, 77740]
    print(pg13_opening_weekend)#showing the pg13_opening_weekend list
    
    [55785112, 35258, 22403596, 11947744, 35574710, 526011, 220522, 320690, 24074047, 12381585, 15371203, 143818, 16842353, 29632823, 14789393, 7102085, 24830443, 372920, 13019686, 11731703, 41202458, 13203458, 21401594, 10003827, 26044590, 30468614, 22618358, 12305016, 9783603, 13002632, 129462, 18723269, 4765838, 105005, 9851102, 15002635, 8089139, 20874072, 30452, 30452, 5079566, 76244, 228359, 8310232, 5467084, 187281, 15679190, 11727390, 2215891, 68266, 14065500, 6213362, 13501349, 446380, 4750894, 21688103, 212000, 4690214, 53778, 55438, 9112839, 20321, 128140, 77740]
    

    Getting the opening weekend of movies that are NC-17 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.

    In [22]:
    nc17_opening_weekend = [361000, 69100, 0, 0, 0, 85709, 738339, 143632, 63918, 361000, 100316, 142632, 361000,
                           100316, 193728, 649423, 24286, 11014818, 63918, 25775847, 361000, 0, 11166687, 31665,
                           245398, 0, 85709, 738339, 100000, 70188, 63918, 130303, 0, 0]
    print(nc17_opening_weekend)#showing the nc17_opening_weekend list
    
    [361000, 69100, 0, 0, 0, 85709, 738339, 143632, 63918, 361000, 100316, 142632, 361000, 100316, 193728, 649423, 24286, 11014818, 63918, 25775847, 361000, 0, 11166687, 31665, 245398, 0, 85709, 738339, 100000, 70188, 63918, 130303, 0, 0]
    

    Getting the Budget of movies that are R-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [23]:
    r_cost = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='R'and Drama_DataFrame.Profit[i] >= 0:
                r_cost.append(int(Drama_DataFrame.Production_Budget[i]))
    print(r_cost)#showing the r_cost list
    
    [100000000, 61000000, 60000000, 55000000, 55000000, 55000000, 52500000, 40000000, 37500000, 31000000, 23000000, 22500000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 13000000, 12000000, 12000000, 12000000, 11800000, 11000000, 10000000, 9400000, 8500000, 7000000, 5000000, 4900000, 4750000, 4000000, 3500000, 3400000, 3300000, 3000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 1987650, 1500000, 1000000, 1000000, 1000000, 135000, 100000, 6000000, 8500000, 20000000, 100000, 2700000, 11500000, 9000000]
    

    Getting the Budget of movies that are PG-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [24]:
    pg_cost = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='PG'and Drama_DataFrame.Profit[i] >= 0:
                pg_cost.append(int(Drama_DataFrame.Production_Budget[i]))
    print(pg_cost)#showing the pg_cost list
    
    [180000000, 37000000, 20000000, 20000000, 3000000, 1700000, 5100000, 10000000, 95000000, 3000000, 20000000, 40000000, 5000000, 422000, 11800000, 15000000, 32000000, 40000000, 8000000, 17000000, 30000000, 500000, 20000000, 2000000, 23000000, 32000000, 90000000, 10000000, 16000000, 3000000, 15000000, 10000000, 20000000, 12000000, 5000000, 7000000, 14000000, 15000000, 12000000, 7500000, 17000000, 5000000, 22000000, 4500000, 8200000, 28000000]
    

    Getting the Budget of movies that are G-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [25]:
    g_cost = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='G'and Drama_DataFrame.Profit[i] >= 0:
                g_cost.append(int(Drama_DataFrame.Production_Budget[i]))
    print(g_cost)#showing the g_cost list
    
    [700000, 7000000, 22000000, 20000000, 23000000, 15000000, 2700000, 70000000, 30000000, 2500000, 666000, 85000000, 10000000, 22000000, 18000000, 8200000, 60000000, 45000000, 858000, 17000000, 10000000, 6400000, 13000000, 1750000, 1700000]
    

    Getting the Budget of movies that are PG-13 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [26]:
    pg13_cost = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='PG-13'and Drama_DataFrame.Profit[i] >= 0:
                pg13_cost.append(int(Drama_DataFrame.Production_Budget[i]))
    print(pg13_cost)#showing the pg13_cost list
    
    [110000000, 75000000, 60000000, 55000000, 50000000, 50000000, 50000000, 49000000, 47000000, 44000000, 40000000, 40000000, 38000000, 37000000, 37000000, 36000000, 35000000, 35000000, 34000000, 33000000, 30000000, 30000000, 28000000, 26000000, 25000000, 25000000, 25000000, 25000000, 25000000, 25000000, 24000000, 20000000, 20000000, 19000000, 17000000, 17000000, 16000000, 16000000, 15000000, 15000000, 15000000, 14000000, 13000000, 12000000, 12000000, 11000000, 11000000, 10000000, 10000000, 9700000, 9000000, 9000000, 7400000, 7000000, 6000000, 5000000, 5000000, 5000000, 5000000, 2600000, 2000000, 1400000, 250000, 175000]
    

    Getting the Budget of movies that are NC-17 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [27]:
    nc17_cost = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='NC-17'and Drama_DataFrame.Profit[i] >= 0:
                nc17_cost.append(int(Drama_DataFrame.Production_Budget[i]))
    print(nc17_cost)#showing the nc17_cost list
    
    [6500000, 12500000, 1000000, 20000, 955472, 1500000, 9000000, 15000000, 15000000, 6500000, 4000000, 15000000, 6500000, 4074940, 1000000, 1000000, 3565572, 12000000, 15000000, 350000, 6500000, 904765, 34000000, 230000, 1000000, 1000000, 1500000, 6500000, 1250000, 12000, 15000000, 2200000, 50000, 612072]
    

    Creating the df_opening dataframe.

    In [491]:
    df_opening = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
                       "Opening_Weekend":r_opening_weekend+pg_opening_weekend
                       +g_opening_weekend+pg13_opening_weekend+nc17_opening_weekend,
                       "Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4
                       })
    

    The 'df_opening' dataframe. (this dataframe is interactive)

    In [28]:
    df_opening
    
    Out[28]:
    Budget Opening_Weekend Profit
    Loading... (need help?)

    Creating a 3D scatter plot of the Profit, Budget and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object

    In [582]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
     
    # Creating dataset
    z = df_opening['Profit']
    x = df_opening['Budget']
    y = df_opening['Opening_Weekend']
     
    # Creating figure
    #fig = plt.figure(figsize = (3.5, 8))
    plt.rcParams['figure.figsize'] = [3.6, 3.6]
    ax = plt.axes(projection ="3d")
     
    # Creating plot
    ax.scatter3D(x, y, z, color = "red")
    #plt.title("simple 3D scatter plot")
    
    ax.set_xlabel('Budget', size = 7.5)
    ax.set_ylabel('Opening_Weekend', size = 7.5)
    ax.set_zlabel('Profit', size = 7.5)
     
    # show plot
    ax.tick_params('z', labelsize=7)
    plt.xticks(np.arange(0, 180000000, 25000000))
    plt.xticks(fontsize=7)
    plt.yticks(fontsize=7)
    plt.show()
    
    #Creating the Animation object
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    

    Saving the animated 3D scatter plot gif as 'drama1.gif'.

    In [571]:
    #f = r"c://Users/xxDownloads/Project%201/Ani.gif" 
    writergif = animation.PillowWriter(fps=30)
    #ani.save(f, writer=writergif)
    ani.save('drama1.gif', fps=10)
    #ani.save('first44.gif')
    
    WARNING:matplotlib.animation:MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The first 3D Scatter Plot (part A): the x-axis is the 'Budegt', the y-axis is the 'Opening Weekend' and the z-axis is the 'Profit'. The purpose of this animation is to see if the amount of the budget and opening weekend of a movie as an affect on the profit of a movie.

    Creating a 3D scatter plot with a linear plane of the Profit, Budget and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object

    In [138]:
    def init():
       # Plot the surface
        ax.scatter(df['Budget'],df['Opening_Weekend'],df['Profit'],alpha=0.5, s=50,color='red')
        #ax.plot_surface(x_surf,y_surf,fittedY, alpha=0.4 ,rstride=1, cstride=1)
        return fig
    
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    def func(num, dataSet, line):
        # NOTE: there is no .set_data() for 3 dim data...
        sscatter.set_data(dataSet[0:2, :num])    
        sscatter.set_3d_properties(dataSet[2, :num])    
        return sscatter
     
    
    dataSet = np.array([df['Budget'],df['Opening_Weekend'],df['Profit']])
    numDataPoints = len(df['Profit'])
    fig = plt.figure()
    #fig1 = plt.figure()
    #ax = Axes3D(fig)
    ax = Axes3D(fig)
    #scatter = ax.scatter(dataSet[0], dataSet[1], dataSet[2],alpha=0.5, s=50,color='red')
    linear = ax.scatter(dataSet[0], dataSet[1], dataSet[2],alpha=0.5, s=40,color='red')
    linear = ax.plot_surface(x_surf,y_surf,fittedY, alpha=0.4 ,rstride=1, cstride=1, color='brown')
    ax.set_xlabel('Budget')
    ax.set_ylabel('Opening_Weekend')
    ax.set_zlabel('Profit')
            
    #plt.show(ax,ax1)
    #Creating the Animation object
    #line_ani = animation.FuncAnimation(fig, func ,frames=numDataPoints, fargs=(dataSet,line), interval=50, blit=False)
    #line_ani.save(r'AnimationNeww.gif')
    # Animate frames=90
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani                
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24224/68577760.py:24: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[138]:
    <matplotlib.animation.FuncAnimation at 0x2991d6e4d90>

    Saving the animated 3D scatter plot gif with a linaer plane as 'drama.gif'.

    In [140]:
    #f = r"c://Users/xxDownloads/Project%201/Ani.gif" 
    writergif = animation.PillowWriter(fps=30)
    #ani.save(f, writer=writergif)
    ani.save('drama.gif', fps=10)
    #ani.save('first44.gif')
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The first 3D Scatter Plot (part B): the x-axis is the 'Budegt', the y-axis is the 'Opening Weekend' and the z-axis is the 'Profit'. The purpose of this animation is to see if the amount of the budget and opening weekend of a movie as an affect on the profit of a movie.

    Getting the month R-rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.

    In [3]:
    r_month = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x=='R'and Drama_DataFrame.Profit[i] >= 0:
            if Drama_DataFrame.Release_Date[i][:3] == 'Jan':r_month.append(1)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':r_month.append(2)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':r_month.append(3)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':r_month.append(4)
            elif Drama_DataFrame.Release_Date[i][:3] == 'May':r_month.append(5)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':r_month.append(6)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':r_month.append(7)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':r_month.append(8)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':r_month.append(9)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':r_month.append(10)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':r_month.append(11)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':r_month.append(12)
            else:r_month.append('Nan')
    

    Showing the 'r_month' list.

    In [363]:
    print(r_month)
    
    [12, 10, 5, 2, 2, 10, 12, 2, 9, 11, 10, 11, 4, 11, 8, 10, 12, 4, 10, 12, 9, 3, 11, 1, 6, 11, 11, 1, 10, 1, 9, 7, 2, 10, 10, 5, 3, 6, 10, 8, 4, 10, 9, 3, 12, 10, 5, 4, 7, 9, 5, 7, 12, 1, 10, 9]
    

    Getting the month PG-rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.

    In [4]:
    pg_month = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x=='PG'and Drama_DataFrame.Profit[i] >= 0:
            if Drama_DataFrame.Release_Date[i][:3] == 'Jan':pg_month.append(1)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':pg_month.append(2)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':pg_month.append(3)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':pg_month.append(4)
            elif Drama_DataFrame.Release_Date[i][:3] == 'May':pg_month.append(5)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':pg_month.append(6)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':pg_month.append(7)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':pg_month.append(8)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':pg_month.append(9)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':pg_month.append(10)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':pg_month.append(11)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':pg_month.append(12)
            else:pg_month.append('Nan')
    

    Showing the 'pg_month' list.

    In [360]:
    print(pg_month)
    
    [11, 9, 11, 3, 8, 2, 10, 6, 3, 8, 11, 12, 8, 12, 1, 10, 10, 6, 4, 2, 11, 9, 3, 3, 1, 7, 7, 5, 1, 2, 11, 10, 12, 10, 7, 9, 12, 2, 12, 6, 5, 7, 7, 3, 2, 5]
    

    Getting the month G-rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.

    In [5]:
    g_month = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x=='G'and Drama_DataFrame.Profit[i] >= 0:
            if Drama_DataFrame.Release_Date[i][:3] == 'Jan':g_month.append(1)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':g_month.append(2)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':g_month.append(3)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':g_month.append(4)
            elif Drama_DataFrame.Release_Date[i][:3] == 'May':g_month.append(5)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':g_month.append(6)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':g_month.append(7)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':g_month.append(8)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':g_month.append(9)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':g_month.append(10)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':g_month.append(11)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':g_month.append(12)
            else:g_month.append('Nan')
    

    Showing the 'g_month' list.

    In [357]:
    print(g_month)
    
    [4, 11, 3, 11, 8, 7, 10, 6, 8, 5, 12, 10, 7, 3, 4, 4, 12, 6, 8, 12, 3, 11, 10, 9, 5]
    

    Getting the month PG-13 rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.

    In [6]:
    pg13_month = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x=='PG-13'and Drama_DataFrame.Profit[i] >= 0:
            if Drama_DataFrame.Release_Date[i][:3] == 'Jan':pg13_month.append(1)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':pg13_month.append(2)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':pg13_month.append(3)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':pg13_month.append(4)
            elif Drama_DataFrame.Release_Date[i][:3] == 'May':pg13_month.append(5)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':pg13_month.append(6)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':pg13_month.append(7)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':pg13_month.append(8)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':pg13_month.append(9)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':pg13_month.append(10)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':pg13_month.append(11)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':pg13_month.append(12)
            else:pg13_month.append('Nan')
    

    Showing the 'pg13_month' list.

    In [353]:
    print(pg13_month)
    
    [10, 12, 9, 11, 11, 12, 10, 11, 11, 7, 10, 12, 4, 11, 1, 12, 12, 5, 4, 7, 2, 4, 2, 10, 8, 2, 4, 8, 4, 2, 12, 6, 9, 11, 4, 3, 2, 3, 2, 12, 8, 10, 9, 1, 7, 8, 11, 5, 4, 12, 10, 1, 1, 4, 9, 7, 1, 3, 12, 5, 9, 11, 10, 7]
    

    Getting the month NC-17 rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.

    In [7]:
    nc17_month = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x=='NC-17'and Drama_DataFrame.Profit[i] >= 0:
            if Drama_DataFrame.Release_Date[i][:3] == 'Jan':nc17_month.append(1)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':nc17_month.append(2)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':nc17_month.append(3)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':nc17_month.append(4)
            elif Drama_DataFrame.Release_Date[i][:3] == 'May':nc17_month.append(5)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':nc17_month.append(6)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':nc17_month.append(7)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':nc17_month.append(8)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':nc17_month.append(9)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':nc17_month.append(10)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':nc17_month.append(11)
            elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':nc17_month.append(12)
            else:nc17_month.append(9)
    

    Showing the 'nc17_month' list.

    In [354]:
    print(nc17_month)
    
    [12, 3, 10, 4, 4, 9, 3, 2, 10, 1, 10, 2, 12, 10, 12, 9, 4, 9, 9, 10, 12, 7, 8, 10, 11, 6, 7, 5, 1, 3, 9, 10, 10, 4]
    

    Creating the df_month dataframe.

    In [588]:
    df_month = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
                       "Month_Realesed":r_month+pg_month+g_month+pg13_month+nc17_month,
                       "Revenue":world_int+world_int1+world_int2+world_int3+world_int4,
                       })
    

    The 'df_month' dataframe. (this dataframe is interactive)

    In [29]:
    df_month
    
    Out[29]:
    Budget Month_Realesed Revenue
    Loading... (need help?)

    Creating a 3D scatter plot of the Budget, Month Realesed and Revenue of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object

    In [180]:
    def init():
       # Plot the surface
        ax.scatter(df['Budget'],df['Opening_Weekend'],df['Profit'],alpha=0.5, s=50,color='red')
        #ax.plot_surface(x_surf,y_surf,fittedY, alpha=0.4 ,rstride=1, cstride=1)
        return fig
    
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    def func(num, dataSet, line):
        # NOTE: there is no .set_data() for 3 dim data...
        sscatter.set_data(dataSet[0:2, :num])    
        sscatter.set_3d_properties(dataSet[2, :num])    
        return sscatter
    
    
    fig = plt.figure(figsize=(20, 15))
    fig = plt.figure()
    
    #ax1 = fig.add_subplot(131, projection='3d')
    #ax2 = fig.add_subplot(132, projection='3d')
    #ax3 = fig.add_subplot(133, projection='3d')
    #ax4 = fig.add_subplot(111, projection='3d')
    ax = Axes3D(fig)
    
    #axes = [ax1, ax2, ax3]
    # Creating plot
    #for ax in axes:
    cluster = ax.scatter(df['Budget'],df['Month_Realesed'],df['Revenue'], alpha=0.5,s=50, color='#C41E3A')
    #ax1.plot_surface(x_surf,y_surf,fittedY, alpha=0.4, rstride=1, cstride=1,color='brown')
    cluster = ax.set_xlabel('Budget')
    cluster = ax.set_ylabel('Month_Realesed')
    cluster = ax.set_zlabel('Revenue')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24224/4266163597.py:26: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[180]:
    <matplotlib.animation.FuncAnimation at 0x29921e0aa30>
    <Figure size 1440x1080 with 0 Axes>

    Saving the animated 3D scatter plot gif as 'drama2.gif'.

    In [181]:
    #f = r"c://Users/xxDownloads/Project%201/Ani.gif" 
    writergif = animation.PillowWriter(fps=30)
    #ani.save(f, writer=writergif)
    ani.save('drama2.gif', fps=10)
    #ani.save('first44.gif')
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The second 3D Scatter Plot (part A): the x-axis is the 'Budegt', the y-axis is the 'Month_Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters will then be analyzed by observing the amount of Budegt spent and Revenue generated per cluster.

    Getting the Sum of Square Error (SSE) of the Budget, Month Released and Revenue of the movies that are in the Drama Genre from the 'Drama_DataFrame' dataframe to determine the optimal clusters.

    In [210]:
    k_rng =  range(1, 10)
    sse = []
    for k in k_rng:
        km =  KMeans(n_clusters = k)
        km.fit(df[['Budget','Month_Realesed','Revenue']])
        sse.append(km.inertia_)
    
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    

    Showing the 'sse' list.

    In [184]:
    sse
    
    Out[184]:
    [3.9701565672727327e+18,
     1.3374768179929006e+18,
     6.927203948051537e+17,
     4.33795586558498e+17,
     3.111289467424199e+17,
     1.974580280900467e+17,
     1.5998318293649258e+17,
     1.3456461461306184e+17,
     1.1529437847578322e+17]

    Plotting the Sum of Square Error (SSE) to determine the optimal clusters for the movies in the Drama Genre from the 'Drama_DataFrame' dataframe using the elbow method. By using the elbow method below, it shows that the optimal clusters is two.

    In [185]:
    plt.xlabel('x')
    plt.ylabel('Sum of Squared Error')
    plt.plot(k_rng,sse)
    
    Out[185]:
    [<matplotlib.lines.Line2D at 0x29922085e80>]

    Creating the cluster list.

    In [458]:
    cluster = []
    for i in df_month.Month_Realesed:
        if i in [1,2,3,4,5,6]: cluster.append(0)
        elif i in [7,8,9,10,11,12]: cluster.append(1)
    

    Adding the cluster list to the 'df_month' dataframe.

    In [459]:
    df_month['cluster'] = cluster
    

    The updated 'df_month' dataframe. (this dataframe is interactive)

    In [460]:
    df_month
    
    Out[460]:
    Budget Month_Realesed Revenue cluster
    Loading... (need help?)

    Creating a 3D scatter plot of the Budget, Month Realesed and Revenue of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object

    In [460]:
    fig = plt.figure(figsize=(20, 15))
    fig = plt.figure()
    
    #ax1 = fig.add_subplot(131, projection='3d')
    #ax2 = fig.add_subplot(132, projection='3d')
    #ax3 = fig.add_subplot(133, projection='3d')
    #ax4 = fig.add_subplot(111, projection='3d')
    ax = Axes3D(fig)
    
    
    
    df1 = df[df.cluster==0]
    df2 = df[df.cluster==1]
    
    #ax1 = fig.add_subplot(131, projection='3d')
    scatter = ax.scatter(df1['Budget'],df1['Month_Realesed'],df1['Revenue'], alpha=0.5,s=50, color='#C41E3A')
    scatter = ax.scatter(df2['Budget'],df2['Month_Realesed'],df2['Revenue'], alpha=0.5,s=50, color='#702963')
    
    scatter = ax.set_xlabel('Budget')
    scatter = ax.set_ylabel('Month_Realesed')
    scatter = ax.set_zlabel('Revenue')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1808098869.py:8: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[460]:
    <matplotlib.animation.FuncAnimation at 0x2bfc50eb550>
    <Figure size 1440x1080 with 0 Axes>

    Saving the animated 3D scatter plot gif as 'drama3.gif'.

    In [461]:
    #f = r"c://Users/xxDownloads/Project%201/Ani.gif" 
    writergif = animation.PillowWriter(fps=30)
    #ani.save(f, writer=writergif)
    ani.save('drama3.gif', fps=10)
    #ani.save('first44.gif')
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The second 3D Scatter Plot (part B): the x-axis is the 'Budegt', the y-axis is the 'Month_Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters will then be analyzed by observing the amount of Budegt spent and Revenue generated per cluster.

    Getting the index of all the movies that are in the Drama Genre that where realesed from Janurary to June, from the 'df_month' dataframe.

    In [704]:
    cluster_a_index = []
    for i,x in enumerate(df_month.Month_Realesed):
        if x == 1:cluster_a_index.append(i)
        if x == 2:cluster_a_index.append(i)
        if x == 3:cluster_a_index.append(i)
        if x == 4:cluster_a_index.append(i)
        if x == 5:cluster_a_index.append(i)
        if x == 6:cluster_a_index.append(i)
    print(cluster_a_index)#showing the cluster_a_index list
    
    [2, 3, 4, 7, 12, 17, 21, 23, 24, 27, 29, 32, 35, 36, 37, 40, 43, 46, 47, 50, 53, 59, 61, 63, 64, 70, 73, 74, 75, 78, 79, 80, 83, 84, 85, 93, 95, 96, 99, 100, 101, 102, 104, 109, 111, 115, 116, 117, 119, 122, 126, 139, 141, 144, 145, 147, 148, 149, 152, 153, 155, 156, 158, 161, 162, 163, 164, 165, 170, 174, 175, 178, 179, 180, 183, 184, 186, 192, 194, 195, 197, 198, 200, 202, 207, 216, 218, 219, 220, 224]
    

    Checking the number of elements in the 'cluster_a_index' list.

    In [705]:
    len(cluster_a_index)
    
    Out[705]:
    90

    Using the indexes from the 'cluster_a_index' list to get the Month_Realesed, Revenue and Budget of each movie that was realesed from Janurary to June.

    In [602]:
    month_a = []
    rev_a = []
    budg_a = []
    for i in cluster_a_index:
        month_a.append(df_month['Month_Realesed'][i])
        rev_a.append(df_month['Revenue'][i])
        budg_a.append(df_month['Budget'][i])
    

    Showing the 'month_a' list.

    In [603]:
    print(month_a)
    
    [5, 2, 2, 2, 4, 4, 3, 1, 6, 1, 1, 2, 5, 3, 6, 4, 3, 5, 4, 5, 1, 3, 2, 6, 3, 1, 6, 4, 2, 3, 3, 1, 5, 1, 2, 2, 6, 5, 3, 2, 5, 4, 3, 6, 5, 3, 4, 4, 6, 3, 5, 4, 1, 5, 4, 2, 4, 2, 2, 4, 4, 2, 6, 4, 3, 2, 3, 2, 1, 5, 4, 1, 1, 4, 1, 3, 5, 3, 4, 4, 3, 2, 1, 2, 4, 6, 5, 1, 3, 4]
    

    Showing the 'rev_a' list.

    In [604]:
    print(rev_a)
    
    [84154026, 381398492, 371350619, 570998101, 31054727, 38358392, 12034913, 56178935, 70133905, 10765283, 17536004, 40454520, 23251930, 16610760, 16131551, 2088390, 14244931, 1156309, 429448, 77211836, 3256082, 92678948, 12231500, 46918287, 542351353, 47494916, 114830111, 18948425, 137587063, 89137047, 64667874, 106269971, 13835130, 134582776, 6101815, 119285432, 14923752, 125052686, 8443124, 80008942, 48000000, 2411143, 80693537, 325500000, 3750000, 80491516, 311281000, 286214195, 986214868, 47707417, 12000000, 116809717, 97143987, 61721826, 63802928, 197618160, 68984536, 94050951, 142033509, 96633833, 29847480, 82917283, 208265198, 334522294, 38028230, 52545707, 56506120, 128955898, 32909437, 61603136, 31556959, 21971021, 31187727, 36964656, 41699612, 18945682, 15298355, 17356268, 277845, 1614784, 98410061, 15121165, 20412841, 15307113, 3822241, 9000000, 101173038, 36147711, 413802, 1470809]
    

    Showing the 'budg_a' list.

    In [605]:
    print(budg_a)
    
    [60000000, 55000000, 55000000, 40000000, 22500000, 13000000, 12000000, 11000000, 10000000, 7000000, 4900000, 3500000, 3000000, 2000000, 2000000, 2000000, 1500000, 1000000, 135000, 8500000, 2700000, 20000000, 1700000, 10000000, 95000000, 11800000, 40000000, 8000000, 17000000, 20000000, 2000000, 23000000, 10000000, 16000000, 3000000, 15000000, 7500000, 17000000, 4500000, 8200000, 28000000, 700000, 22000000, 70000000, 2500000, 22000000, 18000000, 8200000, 45000000, 10000000, 1700000, 38000000, 37000000, 35000000, 34000000, 30000000, 30000000, 28000000, 25000000, 25000000, 25000000, 25000000, 20000000, 17000000, 17000000, 16000000, 16000000, 15000000, 12000000, 10000000, 10000000, 9000000, 7400000, 7000000, 5000000, 5000000, 2600000, 12500000, 20000, 955472, 9000000, 15000000, 6500000, 15000000, 3565572, 1000000, 6500000, 1250000, 12000, 612072]
    

    Showing the Frequency of the Repeated Months of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June . Which will be stored in a dictionary called 'grouped_month_a'.

    In [693]:
    grouped_month_a= Counter(month_a)
    print(grouped_month_a)#showing the grouped_month_a dictionary
    
    Counter({4: 20, 2: 17, 3: 17, 1: 14, 5: 13, 6: 9})
    

    Showing the Frequency of the Repeated Values of the expenses spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June. Which will be stored in a dictionary called 'budg_a'.

    In [607]:
    print(Counter(budg_a))
    
    Counter({10000000: 6, 2000000: 4, 17000000: 4, 15000000: 4, 25000000: 4, 20000000: 3, 16000000: 3, 55000000: 2, 40000000: 2, 12000000: 2, 7000000: 2, 3000000: 2, 1000000: 2, 1700000: 2, 8200000: 2, 28000000: 2, 22000000: 2, 30000000: 2, 9000000: 2, 5000000: 2, 6500000: 2, 60000000: 1, 22500000: 1, 13000000: 1, 11000000: 1, 4900000: 1, 3500000: 1, 1500000: 1, 135000: 1, 8500000: 1, 2700000: 1, 95000000: 1, 11800000: 1, 8000000: 1, 23000000: 1, 7500000: 1, 4500000: 1, 700000: 1, 70000000: 1, 2500000: 1, 18000000: 1, 45000000: 1, 38000000: 1, 37000000: 1, 35000000: 1, 34000000: 1, 7400000: 1, 2600000: 1, 12500000: 1, 20000: 1, 955472: 1, 3565572: 1, 1250000: 1, 12000: 1, 612072: 1})
    

    Getting the minimum budget spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June.

    In [100]:
    min(budg_a)
    
    Out[100]:
    12000

    Creating a function called 'Average' that gets the average value of a list of values.

    In [101]:
    def Average(lst):
        return sum(lst) / len(lst)
    

    Getting the average budegt of Drama movies from the 'Drama_DataFrame' datafrme that were realsed from Janurary to June.

    In [609]:
    average_budg_a = Average(budg_a)
    average_budg_a #16,082,779
    
    Out[609]:
    16082779.066666666

    Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that have a budget of $100,000 to $20 Million.

    In [611]:
    group_one_index = []
    for i in cluster_a_index:
        if 0 <= df_month['Budget'][i] <= 20000000:group_one_index.append(i) 
    print(group_one_index)#showing the group_one_index list
    
    [17, 21, 23, 24, 27, 29, 32, 35, 36, 37, 40, 43, 46, 47, 50, 53, 59, 61, 63, 70, 74, 75, 78, 79, 83, 84, 85, 93, 95, 96, 99, 100, 102, 111, 116, 117, 122, 126, 158, 161, 162, 163, 164, 165, 170, 174, 175, 178, 179, 180, 183, 184, 186, 192, 194, 195, 197, 198, 200, 202, 207, 216, 218, 219, 220, 224]
    

    Checking the number of elements in the 'group_one_index' list.

    In [614]:
    len(group_one_index)
    
    Out[614]:
    66

    Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that have a budget that is greater than $21 Million .

    In [612]:
    group_two_index = []
    for i in cluster_a_index:
        if 20000001 <= df_month['Budget'][i] :group_two_index.append(i)  
    print(group_two_index)#showing the group_two_index list
    
    [2, 3, 4, 7, 12, 64, 73, 80, 101, 104, 109, 115, 119, 139, 141, 144, 145, 147, 148, 149, 152, 153, 155, 156]
    

    Checking the number of elements in the 'group_two_index' list.

    In [613]:
    len(group_two_index)
    
    Out[613]:
    24

    Creating a function called 'round_to_multiple that rounds a value to the nearest value that is chosen.

    In [102]:
    def round_to_multiple(number, multiple):
        return multiple * round(number / multiple)
    

    Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 10 million, that were realesed in the months of Janurary to June that has a budget of $100,000 to $20 Million.

    In [617]:
    rev_a_one = []
    for i in group_one_index:rev_a_one.append(round_to_multiple(df_month['Revenue'][i],10000000))
    print(rev_a_one)#showing the rev_a_one list
    
    [40000000, 10000000, 60000000, 70000000, 10000000, 20000000, 40000000, 20000000, 20000000, 20000000, 0, 10000000, 0, 0, 80000000, 0, 90000000, 10000000, 50000000, 50000000, 20000000, 140000000, 90000000, 60000000, 10000000, 130000000, 10000000, 120000000, 10000000, 130000000, 10000000, 80000000, 0, 0, 310000000, 290000000, 50000000, 10000000, 210000000, 330000000, 40000000, 50000000, 60000000, 130000000, 30000000, 60000000, 30000000, 20000000, 30000000, 40000000, 40000000, 20000000, 20000000, 20000000, 0, 0, 100000000, 20000000, 20000000, 20000000, 0, 10000000, 100000000, 40000000, 0, 0]
    

    Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed with in the months Janurary to June with a budget of $100,000 to $20 Million.

    In [618]:
    print(Counter(rev_a_one))
    
    Counter({20000000: 12, 0: 11, 10000000: 10, 40000000: 6, 60000000: 4, 50000000: 4, 130000000: 3, 30000000: 3, 80000000: 2, 90000000: 2, 100000000: 2, 70000000: 1, 140000000: 1, 120000000: 1, 310000000: 1, 290000000: 1, 210000000: 1, 330000000: 1})
    

    Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of Janurary to June with a budget of $100,000 to $20 Million.

    In [619]:
    group_one = []
    for i in group_one_index:group_one.append(df_month['Revenue'][i])
    print(group_one)#showing the group_one list
    
    [38358392, 12034913, 56178935, 70133905, 10765283, 17536004, 40454520, 23251930, 16610760, 16131551, 2088390, 14244931, 1156309, 429448, 77211836, 3256082, 92678948, 12231500, 46918287, 47494916, 18948425, 137587063, 89137047, 64667874, 13835130, 134582776, 6101815, 119285432, 14923752, 125052686, 8443124, 80008942, 2411143, 3750000, 311281000, 286214195, 47707417, 12000000, 208265198, 334522294, 38028230, 52545707, 56506120, 128955898, 32909437, 61603136, 31556959, 21971021, 31187727, 36964656, 41699612, 18945682, 15298355, 17356268, 277845, 1614784, 98410061, 15121165, 20412841, 15307113, 3822241, 9000000, 101173038, 36147711, 413802, 1470809]
    

    Getting the minimum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June with a budget of $100,000 to $20 Million.

    In [305]:
    min(group_one)
    
    Out[305]:
    277845
    In [620]:
    # 450,000,000 - 850,000,000 (#4)(16%)
    stor1 = []
    for i in group_one:
        if 200000<=i<=10000000:stor1.append(i)
    print(stor1)#showing the stor1 list
    
    [2088390, 1156309, 429448, 3256082, 6101815, 8443124, 2411143, 3750000, 277845, 1614784, 3822241, 9000000, 413802, 1470809]
    
    In [621]:
    # 100,000,000 - 150,000,000 (#10)(24%)
    stor2 = []
    for i in group_one:
        if 10000001<=i<=50000000:stor2.append(i)
    print(stor2)#showing the stor2 list   
    
    [38358392, 12034913, 10765283, 17536004, 40454520, 23251930, 16610760, 16131551, 14244931, 12231500, 46918287, 47494916, 18948425, 13835130, 14923752, 47707417, 12000000, 38028230, 32909437, 31556959, 21971021, 31187727, 36964656, 41699612, 18945682, 15298355, 17356268, 15121165, 20412841, 15307113, 36147711]
    
    In [622]:
    # 150,000,000 - 250,000,000 (#10)(24%)
    stor3 = []
    for i in group_one:
        if 50000001<=i<=100000000:stor3.append(i)
    print(stor3)#showing the stor3 list
    
    [56178935, 70133905, 77211836, 92678948, 89137047, 64667874, 80008942, 52545707, 56506120, 61603136, 98410061]
    
    In [623]:
    # 450,000,000 - 850,000,000 (#4)(16%)
    stor4 = []
    for i in group_one:
        if 100000001<=i<=200000000:stor4.append(i)
    print(stor4)#showing the stor3 list
    
    [137587063, 134582776, 119285432, 125052686, 128955898, 101173038]
    
    In [624]:
    # 250,000,000 - 350,000,000 (#6)(15%)
    stor5 = []
    for i in group_one:
        if 200000001<=i<=300000000:stor5.append(i)
    print(stor5)#showing the stor3 list
    
    [286214195, 208265198]
    
    In [625]:
    # 350,000,000 - 500,000,000 (#7)(17%)
    stor6 = []
    for i in group_one:
        if 300000001<=i<=400000000:stor6.append(i)
    print(stor6)#showing the stor6 list
    
    [311281000, 334522294]
    

    Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 50 million, that were realesed in the months of Janurary to June that has a budget greater than $21 Million.

    In [627]:
    rev_a_two = []
    for i in group_two_index:rev_a_two.append(round_to_multiple(df_month['Revenue'][i],50000000))
    print(rev_a_two)#showing the rev_a_two list
    
    [100000000, 400000000, 350000000, 550000000, 50000000, 550000000, 100000000, 100000000, 50000000, 100000000, 350000000, 100000000, 1000000000, 100000000, 100000000, 50000000, 50000000, 200000000, 50000000, 100000000, 150000000, 100000000, 50000000, 100000000]
    

    Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June with a budget greater than $21 Million.

    In [628]:
    print(Counter(rev_a_two))
    
    Counter({100000000: 10, 50000000: 6, 350000000: 2, 550000000: 2, 400000000: 1, 1000000000: 1, 200000000: 1, 150000000: 1})
    

    Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of Janurary to June with a budget greater than $21 Million.

    In [629]:
    group_two = []
    for i in group_two_index:group_two.append(df_month['Revenue'][i])
    print(group_two)#showing the group_two list
    
    [84154026, 381398492, 371350619, 570998101, 31054727, 542351353, 114830111, 106269971, 48000000, 80693537, 325500000, 80491516, 986214868, 116809717, 97143987, 61721826, 63802928, 197618160, 68984536, 94050951, 142033509, 96633833, 29847480, 82917283]
    

    Getting the maximum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June with a budget greater than $21 Million.

    In [630]:
    max(group_two)
    
    Out[630]:
    986214868
    In [631]:
    #100,000,000-150,000,000 (#3)(6%)
    stor7 = []
    for i in group_two:
        if 20000000 <= i<=100000000:stor7.append(i)
    print(stor7)#showing the stor7 list
    
    [84154026, 31054727, 48000000, 80693537, 80491516, 97143987, 61721826, 63802928, 68984536, 94050951, 96633833, 29847480, 82917283]
    
    In [632]:
    #150,000,000-200,000,000 (#2)(4%)
    stor8 = []
    for i in group_two:
        if 100000001 <= i<=200000000:stor8.append(i)
    print(stor8)#showing the stor8 list
    
    [114830111, 106269971, 116809717, 197618160, 142033509]
    
    In [633]:
    #200,000,000-250,000,000 (#9)(18%)
    stor9 = []
    for i in group_two:
        if 200000001 <=i<=400000000:stor9.append(i)
    print(stor9)#showing the stor9 list
    
    [381398492, 371350619, 325500000]
    
    In [634]:
    #250,000,000-350,000,000 (#4)(8%)
    stor10 = []
    for i in group_two:
        if 400000001 <=i<=600000000:stor10.append(i)
    print(stor10)#showing the stor10 list
    
    [570998101, 542351353]
    
    In [635]:
    #350,000,000-450,000,000 (#8)(16%)
    stor11 = []
    for i in group_two:
        if 900000001 <= i<=1000000000:stor11.append(i)
    print(stor11)#showing the stor11 list
    
    [986214868]
    

    Getting the index of all the movies that are in the Drama Genre that where realesed from July to December, from the 'df_month' dataframe.

    In [711]:
    cluster_b_index = []
    for i,x in enumerate(df_month.Month_Realesed):
        if x == 7:cluster_b_index.append(i)
        if x == 8:cluster_b_index.append(i)
        if x == 9:cluster_b_index.append(i)
        if x == 10:cluster_b_index.append(i)
        if x == 11:cluster_b_index.append(i)
        if x == 12:cluster_b_index.append(i)
    print(cluster_b_index)#showing the cluster_b_index list
    
    [0, 1, 5, 6, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19, 20, 22, 25, 26, 28, 30, 31, 33, 34, 38, 39, 41, 42, 44, 45, 48, 49, 51, 52, 54, 55, 56, 57, 58, 60, 62, 65, 66, 67, 68, 69, 71, 72, 76, 77, 81, 82, 86, 87, 88, 89, 90, 91, 92, 94, 97, 98, 103, 105, 106, 107, 108, 110, 112, 113, 114, 118, 120, 121, 123, 124, 125, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 140, 142, 143, 146, 150, 151, 154, 157, 159, 160, 166, 167, 168, 169, 171, 172, 173, 176, 177, 181, 182, 185, 187, 188, 189, 190, 191, 193, 196, 199, 201, 203, 204, 205, 206, 208, 209, 210, 211, 212, 213, 214, 215, 217, 221, 222, 223]
    

    Checking the number of elements in the 'cluster_b_index' list.

    In [712]:
    len(cluster_b_index)
    
    Out[712]:
    135

    Using the indexes from the 'cluster_b_index' list to get the Month_Realesed, Revenue and Budget of each movie that was realesed from July to December.

    In [713]:
    month_b = []
    rev_b = []
    budg_b = []
    for i in cluster_b_index:
        month_b.append(df_month['Month_Realesed'][i])
        rev_b.append(df_month['Revenue'][i])
        budg_b.append(df_month['Budget'][i])
    

    Showing the 'month_b' list.

    In [714]:
    print(month_b)
    
    [12, 10, 10, 12, 9, 11, 10, 11, 11, 8, 10, 12, 10, 12, 9, 11, 11, 11, 10, 9, 7, 10, 10, 10, 8, 10, 9, 12, 10, 7, 9, 7, 12, 10, 9, 11, 9, 11, 8, 10, 8, 11, 12, 8, 12, 10, 10, 11, 9, 7, 7, 11, 10, 12, 10, 7, 9, 12, 12, 7, 7, 11, 11, 8, 7, 10, 8, 12, 10, 7, 12, 8, 12, 11, 10, 9, 10, 12, 9, 11, 11, 12, 10, 11, 11, 7, 10, 12, 11, 12, 12, 7, 10, 8, 8, 12, 9, 11, 12, 8, 10, 9, 7, 8, 11, 12, 10, 9, 7, 12, 9, 11, 10, 7, 12, 10, 9, 10, 10, 12, 10, 12, 9, 9, 9, 10, 12, 7, 8, 10, 11, 7, 9, 10, 10]
    

    Showing the 'rev_b' list.

    In [715]:
    print(rev_b)
    
    [449948323, 368567189, 74966854, 134612435, 50647416, 160558438, 77735925, 32398681, 38017873, 46604054, 28270399, 331266710, 36262783, 19859167, 35830713, 42843521, 21817298, 77733867, 17499242, 4972016, 57273049, 20433227, 38969037, 11295324, 10153415, 6328516, 21270290, 16566240, 5438911, 2769782, 54766923, 34718173, 1951683, 13000000, 11000000, 180047784, 96068724, 304604712, 73975239, 9709597, 73986904, 305937718, 216601214, 38102988, 27118000, 19344615, 38741732, 64605762, 33473297, 152036382, 171120329, 63954968, 15164458, 127956187, 43440294, 17815212, 157297525, 35856053, 40716963, 549368315, 64892670, 18587135, 438656843, 66947950, 27469621, 37799643, 246100000, 4517000, 143985708, 17657973, 90482317, 268000000, 72071636, 30194409, 65500000, 7600377, 693698673, 634454789, 137551594, 90552675, 213591522, 179748880, 108660270, 71004627, 203127894, 48478084, 162498338, 169590606, 173567581, 85309093, 252276928, 165552290, 41059418, 213120004, 66540205, 64282881, 22281732, 76086711, 20601987, 59168692, 34044909, 33069303, 23477345, 78356170, 62076141, 36787044, 81831866, 16369708, 148806510, 6205034, 35185884, 5552584, 3728400, 2102779, 20412841, 1008404, 20412216, 67091915, 19465835, 20412841, 19465835, 16566240, 2315026, 213120004, 65167430, 2661944, 20412841, 3453416, 50283563, 3894240, 2038916, 20412216, 65167430, 5746453, 1008404]
    

    Showing the 'budg_b' list.

    In [716]:
    print(budg_b)
    
    [100000000, 61000000, 55000000, 52500000, 37500000, 31000000, 23000000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 12000000, 12000000, 11800000, 9400000, 8500000, 5000000, 4750000, 4000000, 3400000, 3300000, 2000000, 2000000, 2000000, 1987650, 1000000, 1000000, 100000, 6000000, 20000000, 100000, 11500000, 9000000, 180000000, 37000000, 20000000, 3000000, 5100000, 3000000, 20000000, 40000000, 5000000, 422000, 15000000, 32000000, 30000000, 500000, 32000000, 90000000, 15000000, 10000000, 20000000, 12000000, 5000000, 7000000, 14000000, 12000000, 5000000, 22000000, 7000000, 20000000, 23000000, 15000000, 2700000, 30000000, 666000, 85000000, 10000000, 60000000, 858000, 17000000, 6400000, 13000000, 1750000, 110000000, 75000000, 60000000, 55000000, 50000000, 50000000, 50000000, 49000000, 47000000, 44000000, 40000000, 40000000, 37000000, 36000000, 35000000, 33000000, 26000000, 25000000, 25000000, 24000000, 20000000, 19000000, 15000000, 15000000, 14000000, 13000000, 12000000, 11000000, 11000000, 9700000, 9000000, 6000000, 5000000, 5000000, 2000000, 1400000, 250000, 175000, 6500000, 1000000, 1500000, 15000000, 4000000, 6500000, 4074940, 1000000, 1000000, 12000000, 15000000, 350000, 6500000, 904765, 34000000, 230000, 1000000, 1500000, 15000000, 2200000, 50000]
    

    Showing the Frequency of the Repeated Months of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December . Which will be stored in a dictionary called 'grouped_month_b'.

    In [717]:
    grouped_month_b = Counter(month_b)
    print(grouped_month_b)#showing the grouped_month_b list
    
    Counter({10: 35, 12: 27, 11: 23, 9: 20, 7: 17, 8: 13})
    

    Getting the minimum budget spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December.

    In [718]:
    min(budg_b)
    
    Out[718]:
    50000

    Getting the average budegt of Drama movies from the 'Drama_DataFrame' dataframe that were realsed from July to December.

    In [259]:
    average_budg_b= Average(budg_b)
    average_budg_b #202,851,724
    
    Out[259]:
    20651598.22033898

    Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 100,000, that were realesed in the months of July to December. .

    In [647]:
    bud_b = []
    for i in cluster_b_index:bud_b.append(round_to_multiple(df_month['Budget'][i],100000))
    print(bud_b)#showing the bud_b list
    
    [100000000, 61000000, 55000000, 52500000, 37500000, 31000000, 23000000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 12000000, 12000000, 11800000, 9400000, 8500000, 5000000, 4800000, 3400000, 3300000, 2000000, 2000000, 2000000, 2000000, 1000000, 1000000, 6000000, 100000, 11500000, 9000000, 180000000, 37000000, 20000000, 3000000, 5100000, 3000000, 20000000, 40000000, 5000000, 400000, 15000000, 32000000, 30000000, 500000, 15000000, 10000000, 20000000, 12000000, 7000000, 14000000, 12000000, 7000000, 20000000, 23000000, 2700000, 30000000, 700000, 85000000, 60000000, 900000, 17000000, 6400000, 13000000, 1800000, 110000000, 75000000, 60000000, 55000000, 50000000, 50000000, 50000000, 49000000, 47000000, 40000000, 40000000, 37000000, 36000000, 35000000, 26000000, 25000000, 25000000, 24000000, 20000000, 19000000, 15000000, 15000000, 14000000, 13000000, 11000000, 11000000, 9700000, 9000000, 6000000, 5000000, 2000000, 1400000, 200000, 6500000, 1000000, 1500000, 15000000, 4000000, 6500000, 4100000, 1000000, 1000000, 12000000, 15000000, 400000, 6500000, 34000000, 200000, 1000000, 15000000, 2200000, 0]
    

    Showing the Frequency of the Repeated Values of the budget spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December .

    In [650]:
    print(Counter(bud_b))
    
    Counter({20000000: 7, 15000000: 7, 1000000: 6, 12000000: 5, 2000000: 5, 13000000: 4, 5000000: 3, 40000000: 3, 50000000: 3, 6500000: 3, 55000000: 2, 23000000: 2, 6000000: 2, 9000000: 2, 37000000: 2, 3000000: 2, 400000: 2, 30000000: 2, 7000000: 2, 14000000: 2, 60000000: 2, 25000000: 2, 11000000: 2, 200000: 2, 100000000: 1, 61000000: 1, 52500000: 1, 37500000: 1, 31000000: 1, 22500000: 1, 21000000: 1, 11800000: 1, 9400000: 1, 8500000: 1, 4800000: 1, 3400000: 1, 3300000: 1, 100000: 1, 11500000: 1, 180000000: 1, 5100000: 1, 32000000: 1, 500000: 1, 10000000: 1, 2700000: 1, 700000: 1, 85000000: 1, 900000: 1, 17000000: 1, 6400000: 1, 1800000: 1, 110000000: 1, 75000000: 1, 49000000: 1, 47000000: 1, 36000000: 1, 35000000: 1, 26000000: 1, 24000000: 1, 19000000: 1, 9700000: 1, 1400000: 1, 1500000: 1, 4000000: 1, 4100000: 1, 34000000: 1, 2200000: 1, 0: 1})
    

    Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 50 Million, that were realesed in the months of July to December. .

    In [652]:
    rev_b = []
    for i in cluster_b_index:rev_b.append(round_to_multiple(df_month['Revenue'][i],50000000))
    print(rev_b)#showing the bud_b list
    
    [450000000, 350000000, 50000000, 150000000, 50000000, 150000000, 100000000, 50000000, 50000000, 50000000, 50000000, 350000000, 50000000, 0, 50000000, 50000000, 0, 100000000, 0, 0, 0, 50000000, 0, 0, 0, 0, 0, 0, 50000000, 0, 0, 0, 200000000, 100000000, 300000000, 50000000, 0, 50000000, 300000000, 200000000, 50000000, 50000000, 0, 50000000, 50000000, 50000000, 50000000, 0, 150000000, 50000000, 150000000, 50000000, 50000000, 0, 450000000, 50000000, 50000000, 250000000, 0, 150000000, 100000000, 250000000, 50000000, 50000000, 50000000, 0, 700000000, 650000000, 150000000, 100000000, 200000000, 200000000, 100000000, 50000000, 200000000, 150000000, 150000000, 150000000, 100000000, 250000000, 50000000, 200000000, 50000000, 50000000, 0, 100000000, 0, 50000000, 50000000, 50000000, 100000000, 50000000, 50000000, 100000000, 0, 0, 50000000, 0, 0, 0, 0, 0, 50000000, 0, 0, 0, 0, 0, 200000000, 50000000, 0, 0, 50000000, 0, 0, 50000000, 0, 0]
    

    Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December .

    In [653]:
    print(Counter(rev_b))
    
    Counter({50000000: 41, 0: 40, 100000000: 10, 150000000: 9, 200000000: 7, 250000000: 3, 450000000: 2, 350000000: 2, 300000000: 2, 700000000: 1, 650000000: 1})
    

    Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that have a budget that is less than or equal to $20 Million.

    In [654]:
    group_one_index = []
    for i in cluster_b_index:
        if 40000 <= df_month['Budget'][i] <= 20000000:group_one_index.append(i) 
    print(group_one_index)#showing the group_one_index list
    
    [14, 15, 16, 18, 19, 20, 22, 25, 26, 28, 30, 33, 34, 38, 39, 41, 42, 44, 45, 49, 52, 54, 55, 58, 60, 62, 65, 66, 68, 69, 71, 77, 86, 87, 88, 89, 91, 92, 94, 103, 105, 108, 112, 120, 121, 123, 124, 125, 159, 160, 166, 167, 168, 169, 172, 173, 176, 177, 181, 185, 187, 188, 189, 191, 193, 196, 199, 201, 203, 204, 205, 206, 208, 209, 210, 211, 214, 215, 221, 222, 223]
    

    Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that has a budget that is greater than $180 Million.

    In [655]:
    group_two_index = []
    for i in cluster_b_index:
        if 20000001<= df_month['Budget'][i] <= 180000000:group_two_index.append(i) 
    print(group_two_index)#showing the group_one_index list
    
    [0, 1, 5, 6, 8, 9, 10, 11, 13, 56, 57, 67, 72, 76, 106, 110, 113, 118, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 142, 143, 150, 151, 154, 157, 213]
    

    Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of July to December with a budget of $40,000 to $20 Million.

    In [656]:
    group_one = []
    for i in group_one_index:group_one.append(df_month['Revenue'][i])
    print(group_one)#showing the group_one list
    
    [46604054, 28270399, 331266710, 36262783, 19859167, 35830713, 42843521, 21817298, 77733867, 17499242, 4972016, 20433227, 38969037, 11295324, 10153415, 6328516, 21270290, 16566240, 5438911, 54766923, 1951683, 13000000, 11000000, 304604712, 73975239, 9709597, 73986904, 305937718, 38102988, 27118000, 19344615, 33473297, 63954968, 15164458, 127956187, 43440294, 157297525, 35856053, 40716963, 18587135, 438656843, 37799643, 4517000, 268000000, 72071636, 30194409, 65500000, 7600377, 22281732, 76086711, 20601987, 59168692, 34044909, 33069303, 78356170, 62076141, 36787044, 81831866, 16369708, 6205034, 35185884, 5552584, 3728400, 20412841, 1008404, 20412216, 67091915, 19465835, 20412841, 19465835, 16566240, 2315026, 213120004, 65167430, 2661944, 20412841, 3894240, 2038916, 65167430, 5746453, 1008404]
    

    Checking the number of elements in the 'group_one' list.

    In [657]:
    len(group_one)
    
    Out[657]:
    81

    Getting the minimum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget of $40,000 to $20 Million.

    In [658]:
    min(group_one)
    
    Out[658]:
    1008404

    Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 1 Million, that were realesed in the months of July to December with a budget of $40,000 to $20 Million .

    In [665]:
    rev_b_one = []
    for i in group_one_index:rev_b_one.append(round_to_multiple(df_month['Revenue'][i],1000000))
    print(rev_b_one)#showing the rev_b_one list
    
    [47000000, 28000000, 331000000, 36000000, 20000000, 36000000, 43000000, 22000000, 78000000, 17000000, 5000000, 20000000, 39000000, 11000000, 10000000, 6000000, 21000000, 17000000, 5000000, 55000000, 2000000, 13000000, 11000000, 305000000, 74000000, 10000000, 74000000, 306000000, 38000000, 27000000, 19000000, 33000000, 64000000, 15000000, 128000000, 43000000, 157000000, 36000000, 41000000, 19000000, 439000000, 38000000, 5000000, 268000000, 72000000, 30000000, 66000000, 8000000, 22000000, 76000000, 21000000, 59000000, 34000000, 33000000, 78000000, 62000000, 37000000, 82000000, 16000000, 6000000, 35000000, 6000000, 4000000, 20000000, 1000000, 20000000, 67000000, 19000000, 20000000, 19000000, 17000000, 2000000, 213000000, 65000000, 3000000, 20000000, 4000000, 2000000, 65000000, 6000000, 1000000]
    

    Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget of $40,000 to $20 Million.

    In [666]:
    print(Counter(rev_b_one))
    
    Counter({20000000: 6, 6000000: 4, 19000000: 4, 36000000: 3, 17000000: 3, 5000000: 3, 2000000: 3, 43000000: 2, 22000000: 2, 78000000: 2, 11000000: 2, 10000000: 2, 21000000: 2, 74000000: 2, 38000000: 2, 33000000: 2, 4000000: 2, 1000000: 2, 65000000: 2, 47000000: 1, 28000000: 1, 331000000: 1, 39000000: 1, 55000000: 1, 13000000: 1, 305000000: 1, 306000000: 1, 27000000: 1, 64000000: 1, 15000000: 1, 128000000: 1, 157000000: 1, 41000000: 1, 439000000: 1, 268000000: 1, 72000000: 1, 30000000: 1, 66000000: 1, 8000000: 1, 76000000: 1, 59000000: 1, 34000000: 1, 62000000: 1, 37000000: 1, 82000000: 1, 16000000: 1, 35000000: 1, 67000000: 1, 213000000: 1, 3000000: 1})
    
    In [667]:
    # 100,000,000 - 150,000,000 (#4)(8%)
    stor12 = []
    for i in group_one:
        if 900000<=i<=10000000:stor12.append(i)
    print(stor12)#showing the stor12 list
    
    [4972016, 6328516, 5438911, 1951683, 9709597, 4517000, 7600377, 6205034, 5552584, 3728400, 1008404, 2315026, 2661944, 3894240, 2038916, 5746453, 1008404]
    
    In [668]:
    # 100,000,000 - 150,000,000 (#7)(15%)
    stor13 = []
    for i in group_one:
        if 10000001<=i<=50000000:stor13.append(i)
    print(stor13)#showing the stor13 list    
    
    [46604054, 28270399, 36262783, 19859167, 35830713, 42843521, 21817298, 17499242, 20433227, 38969037, 11295324, 10153415, 21270290, 16566240, 13000000, 11000000, 38102988, 27118000, 19344615, 33473297, 15164458, 43440294, 35856053, 40716963, 18587135, 37799643, 30194409, 22281732, 20601987, 34044909, 33069303, 36787044, 16369708, 35185884, 20412841, 20412216, 19465835, 20412841, 19465835, 16566240, 20412841]
    
    In [669]:
    #150,000,000-250,000,000 (#10)(21%)
    stor14 = []
    for i in group_one:
        if 50000001 <= i<=100000000:stor14.append(i)
    print(stor14)#showing the stor14 list
    
    [77733867, 54766923, 73975239, 73986904, 63954968, 72071636, 65500000, 76086711, 59168692, 78356170, 62076141, 81831866, 67091915, 65167430, 65167430]
    
    In [670]:
    #150,000,000-250,000,000 (#7)(15%)
    stor15 = []
    for i in group_one:
        if 100000001 <= i<=200000000:stor15.append(i)
    print(stor15)#showing the stor15 list
    
    [127956187, 157297525]
    
    In [671]:
    #250,000,000-350,000,000 (#7)(15%)
    stor16 = []
    for i in group_one:
        if 200000001 <= i<=300000000:stor16.append(i)
    print(stor16)#showing the stor16 list
    
    [268000000, 213120004]
    
    In [672]:
    #350,000,000-450,000,000 (#5)(10%)
    stor17 = []
    for i in group_one:
        if 300000001 <= i:stor17.append(i)
    print(stor17)#showing the stor17 list
    
    [331266710, 304604712, 305937718, 438656843]
    

    Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of July to December with a budget that is greater than $180 Million.

    In [659]:
    group_two = []
    for i in group_two_index:group_two.append(df_month['Revenue'][i])
    print(group_two)#showing the group_two list
    
    [449948323, 368567189, 74966854, 134612435, 50647416, 160558438, 77735925, 32398681, 38017873, 180047784, 96068724, 216601214, 38741732, 64605762, 66947950, 246100000, 143985708, 90482317, 693698673, 634454789, 137551594, 90552675, 213591522, 179748880, 108660270, 71004627, 203127894, 162498338, 169590606, 173567581, 85309093, 252276928, 41059418, 213120004, 66540205, 64282881, 50283563]
    

    Checking the number of elements in the 'group_two' list.

    In [660]:
    len(group_two)
    
    Out[660]:
    37

    Getting the maximum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget greater than $180 Million.

    In [661]:
    max(group_two)
    
    Out[661]:
    693698673

    Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 1 Million, that were realesed in the months of July to December with a budget that is greater than $180 Million .

    In [673]:
    rev_b_two = []
    for i in group_two_index:rev_b_two.append(round_to_multiple(df_month['Revenue'][i],50000000))
    print(rev_b_two)#showing the rev_b_two list
    
    [450000000, 350000000, 50000000, 150000000, 50000000, 150000000, 100000000, 50000000, 50000000, 200000000, 100000000, 200000000, 50000000, 50000000, 50000000, 250000000, 150000000, 100000000, 700000000, 650000000, 150000000, 100000000, 200000000, 200000000, 100000000, 50000000, 200000000, 150000000, 150000000, 150000000, 100000000, 250000000, 50000000, 200000000, 50000000, 50000000, 50000000]
    

    Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget that is greater than $180 Million.

    In [674]:
    print(Counter(rev_b_two))
    
    Counter({50000000: 12, 150000000: 7, 100000000: 6, 200000000: 6, 250000000: 2, 450000000: 1, 350000000: 1, 700000000: 1, 650000000: 1})
    
    In [675]:
    # 100,000,000 - 150,000,000 (#2)(8%)
    for i in group_two:
        if 100000000<=i<=150000000:print(i)
    
    134612435
    143985708
    137551594
    108660270
    
    In [676]:
    #150,000,000-250,000,000 (#4)(17%)
    for i in group_two:
        if 150000000 <= i<=200000000:print(i)
    
    160558438
    180047784
    179748880
    162498338
    169590606
    173567581
    
    In [677]:
    #150,000,000-250,000,000 (#4)(17%)
    for i in group_two:
        if 200000000 <= i<=250000000:print(i)
    
    216601214
    246100000
    213591522
    203127894
    213120004
    
    In [678]:
    #350,000,000-450,000,000 (#4)(17%)
    for i in group_two:
        if 250000000 <=i<=300000000:print(i)
    
    252276928
    
    In [680]:
    #350,000,000-450,000,000 (#4)(17%)
    for i in group_two:
        if 350000000 <=i<=450000000:print(i)
    
    449948323
    368567189
    
    In [681]:
    # 450,000,000 - 850,000,000 (#2)(8%)
    for i in group_two:
        if 550000000<=i<=650000000:print(i)
    
    634454789
    
    In [682]:
    # 450,000,000 - 850,000,000 (#6)(25%)
    for i in group_two:
        if 650000000<=i<=800000000:print(i)
    
    693698673
    

    Assigning the season each R-rated Drama movie was realesed through the month it was realesed.

    In [85]:
    season_r = []
    for i in r_month:
        if i in [12,1,2]:season_r.append(1)
        if i in [3,4,5]:season_r.append(2)
        if i in [6,7,8]:season_r.append(3)
        if i in [9,10,11]:season_r.append(4)
    print(season_r)#showing the season_r list
    
    [1, 4, 2, 1, 1, 4, 1, 1, 4, 4, 4, 4, 2, 4, 3, 4, 1, 2, 4, 1, 4, 2, 4, 1, 3, 4, 4, 1, 4, 1, 4, 3, 1, 4, 4, 2, 2, 3, 4, 3, 2, 4, 4, 2, 1, 4, 2, 2, 3, 4, 2, 3, 1, 1, 4, 4]
    

    Assigning the season each PG-rated Drama movie was realesed through the month it was realesed.

    In [86]:
    season_pg = []
    for i in pg_month:
        if i in [12,1,2]:season_pg.append(1)
        if i in [3,4,5]:season_pg.append(2)
        if i in [6,7,8]:season_pg.append(3)
        if i in [9,10,11]:season_pg.append(4)
    print(season_pg)#showing the season_pg list
    
    [4, 4, 4, 2, 3, 1, 4, 3, 2, 3, 4, 1, 3, 1, 1, 4, 4, 3, 2, 1, 4, 4, 2, 2, 1, 3, 3, 2, 1, 1, 4, 4, 1, 4, 3, 4, 1, 1, 1, 3, 2, 3, 3, 2, 1, 2]
    

    Assigning the season each G-rated Drama movie was realesed through the month it was realesed.

    In [87]:
    season_g = []
    for i in g_month:
        if i in [12,1,2]:season_g.append(1)
        if i in [3,4,5]:season_g.append(2)
        if i in [6,7,8]:season_g.append(3)
        if i in [9,10,11]:season_g.append(4)
    print(season_g)#showing the season_g list
    
    [2, 4, 2, 4, 3, 3, 4, 3, 3, 2, 1, 4, 3, 2, 2, 2, 1, 3, 3, 1, 2, 4, 4, 4, 2]
    

    Assigning the season each PG-13 rated Drama movie was realesed through the month it was realesed.

    In [88]:
    season_pg13 = []
    for i in pg13_month:
        if i in [12,1,2]:season_pg13.append(1)
        if i in [3,4,5]:season_pg13.append(2)
        if i in [6,7,8]:season_pg13.append(3)
        if i in [9,10,11]:season_pg13.append(4)
    print(season_pg13)#showing the season_pg13 list
    
    [4, 1, 4, 4, 4, 1, 4, 4, 4, 3, 4, 1, 2, 4, 1, 1, 1, 2, 2, 3, 1, 2, 1, 4, 3, 1, 2, 3, 2, 1, 1, 3, 4, 4, 2, 2, 1, 2, 1, 1, 3, 4, 4, 1, 3, 3, 4, 2, 2, 1, 4, 1, 1, 2, 4, 3, 1, 2, 1, 2, 4, 4, 4, 3]
    

    Assigning the season each NC-17 rated Drama movie was realesed through the month it was realesed.

    In [89]:
    season_nc17 = []
    for i in nc17_month:
        if i in [12,1,2]:season_nc17.append(1)
        if i in [3,4,5]:season_nc17.append(2)
        if i in [6,7,8]:season_nc17.append(3)
        if i in [9,10,11]:season_nc17.append(4)
    print(season_nc17)#showing the season_nc17 list
    
    [1, 2, 4, 2, 2, 4, 2, 1, 4, 1, 4, 1, 1, 4, 1, 4, 2, 4, 4, 4, 1, 3, 3, 4, 4, 3, 3, 2, 1, 2, 4, 4, 4, 2]
    

    Creating the df_season dataframe.

    In [74]:
    df_season = pd.DataFrame({'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
                       "Opening_Weekend":r_opening_weekend+pg_opening_weekend
                       +g_opening_weekend+pg13_opening_weekend+nc17_opening_weekend,
                       "Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4
                       })
    

    The 'df_season' dataframe. (this dataframe is interactive)

    In [75]:
    df_season
    
    Out[75]:
    Season Opening_Weekend Profit
    Loading... (need help?)

    Creating a 3D scatter plot of the Season, Opening Weekend and Profit of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object

    In [431]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    
    #fig = plt.figure(figsize=(5, 5))
    fig = plt.figure()
    #ax = Axes3D(fig)
    #fig, ax = plt.subplots()
    ax = Axes3D(fig)
    #ax = fig.add_subplot(1, 2, 1, projection='3d')
    #fig.subplots_adjust(left=0.125, projection='3d') 
    #fig.subplots_adjust(bottom = 0.1)
    #fig.subplots_adjust(top = 0.9)
    #fig.subplots_adjust(right = 0.9)
    
    #fig = plt.figure(figsize=(6,4))
    #ax = Axes3D(fig)
    cluster = ax.scatter(df['Season'],df['Opening_Weekend'],df['Profit'], alpha=0.5,s=50, color='#ff4500')
    
    cluster = ax.set_xlabel('Season')
    cluster = ax.set_ylabel('Opening_Weekend')
    cluster = ax.set_zlabel('Profit')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/2410550145.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[431]:
    <matplotlib.animation.FuncAnimation at 0x2bfc38f8820>

    Saving the animated 3D scatter plot gif as 'drama4.gif'.

    In [432]:
    writergif = animation.PillowWriter(fps=30)
    ani.save('drama4.gif', fps=10 )
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The third 3D Scatter Plot (part A): the x-axis is the 'Seaon', the y-axis is the 'Month Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters are based on seasons, the clusters will then be analyzed by observing the amount of Revenue generated per cluster.

    Getting the Sum of Square Error (SSE) of the Season, Opening Weekend and Profit of the movies that are in the Drama Genre from the 'Drama_DataFrame' dataframe to determine the optimal clusters.

    In [149]:
    k_rng =  range(1, 10)
    sse = []
    for k in k_rng:
        km = KMeans(n_clusters = k)
        km.fit(df[['Season','Opening_Weekend','Profit']])
        sse.append(km.inertia_)
    
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
      warnings.warn(
    

    Showing the 'sse' list.

    In [150]:
    sse
    
    Out[150]:
    [3.2618910215856246e+18,
     9.963479636909289e+17,
     5.2194459296822835e+17,
     3.142278548042057e+17,
     1.778114385874955e+17,
     1.046883781436286e+17,
     8.207687468107851e+16,
     6.136436882636602e+16,
     4.79187521252093e+16]

    Plotting the Sum of Square Error (SSE) to determine the optimal clusters for the movies in the Drama Genre from the 'Drama_DataFrame' dataframe using the elbow method. By using the elbow method below, it shows below that the optimal clusters is two.

    In [151]:
    plt.xlabel('x')
    plt.ylabel('Sum of Squared Error')
    plt.plot(k_rng,sse)
    
    Out[151]:
    [<matplotlib.lines.Line2D at 0x2bfbe35ed30>]

    Creating the cluster list.

    In [79]:
    y_predicted = []
    for i in df_season["Season"]:
        if i in [1,2]:y_predicted.append(0)
        if i in [3,4]:y_predicted.append(1)
    

    Adding the cluster list to the 'df_season' dataframe.

    In [80]:
    df_season['cluster'] = y_predicted
    

    The updated 'df_season' dataframe. (this dataframe is interactive)

    In [81]:
    df_season
    
    Out[81]:
    Season Opening_Weekend Profit cluster
    Loading... (need help?)

    Creating a 3D scatter plot of the Season, Month Realesed and Revenue of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object

    In [438]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    
    fig = plt.figure()
    #fig = plt.figure(figsize=(4, 15))
    #fig = plt.figure()
    
    ax = Axes3D(fig)
    
    df1 = df[df.cluster==0]
    df2 = df[df.cluster==1]
    
    
    
    #ax1 = fig.add_subplot(131, projection='3d')
    scatter = ax.scatter(df1['Season'],df1['Opening_Weekend'],df1['Profit'], alpha=0.5,s=50, color='#ff4500')
    scatter = ax.scatter(df2['Season'],df2['Opening_Weekend'],df2['Profit'], alpha=0.5,s=50, color='#960018')
    
    scatter = ax.set_xlabel('Season')
    scatter = ax.set_ylabel('Month_Realesed')
    scatter = ax.set_zlabel('Revenue')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/2570737283.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[438]:
    <matplotlib.animation.FuncAnimation at 0x2bfc3d67430>

    Saving the animated 3D scatter plot gif as 'drama5.gif'.

    In [439]:
    writergif = animation.PillowWriter(fps=30)
    ani.save('drama5.gif', fps=10 )
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The third 3D Scatter Plot (part B): the x-axis is the 'Seaon', the y-axis is the 'Month Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters are based on seasons, the clusters will then be analyzed by observing the amount of Revenue generated per cluster.

    Getting the index of all the movies that are in the Drama Genre that where realesed in Winter and Spring, from the 'df_season' dataframe.

    In [90]:
    cluster_a_index = []
    for i,x in enumerate(df_season.cluster):
        if x == 0:cluster_a_index.append(i)
    print(cluster_a_index)#showing the cluster_a_index list
    
    [0, 2, 3, 4, 6, 7, 12, 16, 17, 19, 21, 23, 27, 29, 32, 35, 36, 40, 43, 44, 46, 47, 50, 52, 53, 59, 61, 64, 67, 69, 70, 74, 75, 78, 79, 80, 83, 84, 85, 88, 92, 93, 94, 96, 99, 100, 101, 102, 104, 111, 112, 115, 116, 117, 118, 121, 122, 126, 128, 132, 138, 139, 141, 142, 143, 144, 145, 147, 148, 149, 152, 153, 155, 156, 157, 161, 162, 163, 164, 165, 166, 170, 174, 175, 176, 178, 179, 180, 183, 184, 185, 186, 191, 192, 194, 195, 197, 198, 200, 202, 203, 205, 207, 211, 218, 219, 220, 224]
    

    Checking the number of elements in the 'cluster_a_index' list.

    In [84]:
    len(cluster_a_index)
    
    Out[84]:
    108

    Using the indexes from the 'cluster_a_index' list to get the Season, Profit and Opening Weekend of each movie that was realesed in Winter and Spring.

    In [92]:
    season_a = []
    profit_a = []
    open_a = []
    for i in cluster_a_index:
        season_a.append(df_season['Season'][i])
        profit_a.append(df_season['Profit'][i])
        open_a.append(df_season['Opening_Weekend'][i])
    

    Showing the 'season_a' list.

    In [93]:
    print(season_a)
    
    [1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 2]
    

    Showing the 'profit_a' list.

    In [94]:
    print(profit_a)
    
    [349948323, 24154026, 326398492, 316350619, 82112435, 530998101, 8554727, 318266710, 25358392, 7859167, 34913, 45178935, 3765283, 12636004, 36954520, 20251930, 14610760, 88390, 12744931, 15566240, 156309, 294448, 68711836, 1851683, 556082, 72678948, 10531500, 447351353, 176601214, 26696000, 35694916, 10948425, 120587063, 69137047, 62667874, 83269971, 3835130, 118582776, 3101815, 107956187, 21856053, 104285432, 28716963, 108052686, 3943124, 71808942, 20000000, 1711143, 58693537, 1250000, 3851000, 58491516, 293281000, 278014195, 30482317, 55071636, 37707417, 10300000, 559454789, 129748880, 129590606, 78809717, 60143987, 49309093, 217276928, 26721826, 29802928, 167618160, 38984536, 66050951, 117033509, 71633833, 4847480, 57917283, 40282881, 317522294, 21028230, 36545707, 40506120, 113955898, 5601987, 20909437, 51603136, 21556959, 27087044, 12971021, 23787727, 29964656, 36699612, 13945682, 1205034, 12698355, 13912841, 4856268, 257845, 659312, 89410061, 121165, 13912841, 307113, 13912841, 15566240, 256669, 13912841, 94673038, 34897711, 401802, 858737]
    

    Getting the maximum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.

    In [119]:
    max(profit_a)
    
    Out[119]:
    559454789

    Getting the minimum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.

    In [120]:
    min(profit_a)
    
    Out[120]:
    34913

    Showing the 'open_a' list.

    In [95]:
    print(open_a)
    
    [30122888, 14953664, 46607250, 38560195, 24400000, 85171450, 1220335, 1443809, 237264, 224476, 160547, 47122, 24587, 473882, 8800230, 246914, 6661234, 81006, 3762145, 193728, 63461, 36134, 118150, 2105729, 63356, 16007426, 44542, 67877361, 16755310, 0, 12177488, 6011585, 22564512, 16007426, 9244641, 14466, 124011, 721341, 82601, 5609875, 93005, 89213, 0, 16015408, 46977, 8556935, 5088381, 0, 16021684, 0, 679185, 16021684, 4625583, 0, 10103675, 0, 0, 0, 35258, 526011, 143818, 16842353, 14789393, 7102085, 24830443, 372920, 13019686, 41202458, 13203458, 21401594, 30468614, 22618358, 9783603, 13002632, 129462, 9851102, 15002635, 8089139, 20874072, 30452, 30452, 8310232, 11727390, 2215891, 68266, 6213362, 13501349, 446380, 212000, 4690214, 53778, 55438, 361000, 69100, 0, 0, 738339, 143632, 361000, 142632, 361000, 193728, 24286, 361000, 738339, 100000, 70188, 0]
    

    Getting the maximum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.

    In [110]:
    max(open_a)
    
    Out[110]:
    85171450

    Getting the minimum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.

    In [111]:
    min(open_a)
    
    Out[111]:
    0

    Showing the Frequency of the Repeated Seasons of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter and Spring . Which will be stored in a dictionary called 'grouped_season_a'.

    In [98]:
    grouped_season_a= Counter(season_a)
    print(grouped_season_a)#showing the grouped_season_a list
    
    Counter({1: 58, 2: 50})
    

    Using the 'round_to_multiple' function to round the Opening Weekend of Drama movies to the nearest 4 Million, that were realesed in Winter and Spring.

    In [103]:
    open1_a = []
    for i in open_a:open1_a.append(round_to_multiple(i,4000000))
    print(open1_a)#showing the open1_a list
    
    [32000000, 16000000, 48000000, 40000000, 24000000, 84000000, 0, 0, 0, 0, 0, 0, 0, 0, 8000000, 0, 8000000, 0, 4000000, 0, 0, 0, 0, 4000000, 0, 16000000, 0, 68000000, 16000000, 0, 12000000, 8000000, 24000000, 16000000, 8000000, 0, 0, 0, 0, 4000000, 0, 0, 0, 16000000, 0, 8000000, 4000000, 0, 16000000, 0, 0, 16000000, 4000000, 0, 12000000, 0, 0, 0, 0, 0, 0, 16000000, 16000000, 8000000, 24000000, 0, 12000000, 40000000, 12000000, 20000000, 32000000, 24000000, 8000000, 12000000, 0, 8000000, 16000000, 8000000, 20000000, 0, 0, 8000000, 12000000, 4000000, 0, 8000000, 12000000, 0, 0, 4000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter and Spring . Which will be stored in a dictionary called 'grouped_season_a'.

    In [104]:
    grouped_season_a = Counter(open1_a)
    print(grouped_season_a)#showing the grouped_season_a list
    
    Counter({0: 60, 8000000: 11, 16000000: 10, 4000000: 7, 12000000: 7, 24000000: 4, 32000000: 2, 40000000: 2, 20000000: 2, 48000000: 1, 84000000: 1, 68000000: 1})
    

    Using the 'round_to_multiple' function to round the Profit of Drama movies to the nearest 10 Million, that were realesed in Winter and Spring. .

    In [105]:
    profit_a_one = []
    for i in profit_a:profit_a_one.append(round_to_multiple(i,10000000))
    print(profit_a_one)#showing the profit_a_one list
    
    [350000000, 20000000, 330000000, 320000000, 80000000, 530000000, 10000000, 320000000, 30000000, 10000000, 0, 50000000, 0, 10000000, 40000000, 20000000, 10000000, 0, 10000000, 20000000, 0, 0, 70000000, 0, 0, 70000000, 10000000, 450000000, 180000000, 30000000, 40000000, 10000000, 120000000, 70000000, 60000000, 80000000, 0, 120000000, 0, 110000000, 20000000, 100000000, 30000000, 110000000, 0, 70000000, 20000000, 0, 60000000, 0, 0, 60000000, 290000000, 280000000, 30000000, 60000000, 40000000, 10000000, 560000000, 130000000, 130000000, 80000000, 60000000, 50000000, 220000000, 30000000, 30000000, 170000000, 40000000, 70000000, 120000000, 70000000, 0, 60000000, 40000000, 320000000, 20000000, 40000000, 40000000, 110000000, 10000000, 20000000, 50000000, 20000000, 30000000, 10000000, 20000000, 30000000, 40000000, 10000000, 0, 10000000, 10000000, 0, 0, 0, 90000000, 0, 10000000, 0, 10000000, 20000000, 0, 10000000, 90000000, 30000000, 0, 0]
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter and Spring . Which will be stored in a dictionary called 'grouped_season_a'.

    In [107]:
    print(Counter(profit_a_one))
    
    Counter({0: 23, 10000000: 16, 20000000: 10, 30000000: 9, 40000000: 8, 70000000: 6, 60000000: 6, 320000000: 3, 80000000: 3, 50000000: 3, 120000000: 3, 110000000: 3, 130000000: 2, 90000000: 2, 350000000: 1, 330000000: 1, 530000000: 1, 450000000: 1, 180000000: 1, 100000000: 1, 290000000: 1, 280000000: 1, 560000000: 1, 220000000: 1, 170000000: 1})
    

    Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Winter and Spring amd that has a Opening Weekend of $10 Million to $20 Million..

    In [112]:
    group_one_index = []
    for i in cluster_a_index:
        if 10000000 <= df_season['Opening_Weekend'][i] <= 20000000:group_one_index.append(i) 
    print(group_one_index)#showing the group_one_index list
    
    [2, 59, 67, 70, 78, 96, 104, 115, 118, 139, 141, 145, 148, 156, 162, 174, 179]
    

    Checking the number of elements in the 'group_one_index' list.

    In [114]:
    len(group_one_index)
    
    Out[114]:
    17

    Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Winter and Spring and that has a Opening Weekend of $21 Million to $90 Million..

    In [113]:
    group_two_index = []
    for i in cluster_a_index:
        if 20000001 <= df_season['Opening_Weekend'][i] <= 90000000:group_two_index.append(i) 
    print(group_two_index)#showing the group_two_index list
    
    [0, 3, 4, 6, 7, 64, 75, 143, 147, 149, 152, 153, 164]
    

    Checking the number of elements in the 'group_two_index' list.

    In [115]:
    len(group_two_index)
    
    Out[115]:
    13

    Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring with a Opening Weekend of $10 Million to $20 Million.

    In [116]:
    group_one = []
    for i in group_one_index:group_one.append(df_season['Profit'][i])
    print(group_one)#showing the group_one list
    
    [24154026, 72678948, 176601214, 35694916, 69137047, 108052686, 58693537, 58491516, 30482317, 78809717, 60143987, 29802928, 38984536, 57917283, 21028230, 51603136, 23787727]
    

    Checking the number of elements in the 'group_one' list.

    In [184]:
    len(group_one)
    
    Out[184]:
    17

    Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring with a Opening Weekend of $21 Million to $90 Million.

    In [117]:
    group_two = []
    for i in group_two_index:group_two.append(df_season['Profit'][i])
    print(group_two)#showing the group_two list
    
    [349948323, 326398492, 316350619, 82112435, 530998101, 447351353, 120587063, 217276928, 167618160, 66050951, 117033509, 71633833, 40506120]
    

    Checking the number of elements in the 'group_two' list.

    In [185]:
    len(group_two)
    
    Out[185]:
    13

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Winter and Spring with a Opening Weekend of $10 Million to $20 Million . Which will be stored in a dictionary called 'profit_a_one'.

    In [122]:
    profit_a_one = []
    for i in group_one_index:profit_a_one.append(round_to_multiple(df_season['Profit'][i],10000000))
    
    Counter(profit_a_one)
    
    Out[122]:
    Counter({20000000: 3,
             70000000: 2,
             180000000: 1,
             40000000: 2,
             110000000: 1,
             60000000: 4,
             30000000: 2,
             80000000: 1,
             50000000: 1})

    The maximum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $180 Million with a Opening Weekend of $10 Million to $20 Million .

    In [187]:
    max(profit_a_one)
    
    Out[187]:
    180000000

    The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $20 Million with a Opening Weekend of $10 Million to $20 Million.

    In [188]:
    min(profit_a_one)
    
    Out[188]:
    20000000
    In [190]:
    #30,000,000-100,000,000 (#4)(6%)
    for i in group_one:
        if 20000000 <= i<= 40000000:print(i)
    
    24154026
    35694916
    30482317
    29802928
    38984536
    21028230
    23787727
    
    In [191]:
    #30,000,000-100,000,000 (#4)(6%)
    for i in group_one:
        if 40000001 <= i<= 60000000:print(i)
    
    58693537
    58491516
    57917283
    51603136
    
    In [192]:
    #30,000,000-100,000,000 (#1)(2%)
    for i in group_one:
        if 60000001 <= i<= 80000000:print(i)
    
    72678948
    69137047
    78809717
    60143987
    
    In [195]:
    #30,000,000-100,000,000 (#3)(5%)
    for i in group_one:
        if 100000000 <= i:print(i)
    
    176601214
    108052686
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Winter and Spring with a Opening Weekend of $21 Million to $90 Million . Which will be stored in a dictionary called 'profit_a_one'.

    In [124]:
    profit_a_two = []
    for i in group_two_index:profit_a_two.append(round_to_multiple(df_season['Profit'][i],10000000))
    
    Counter(profit_a_two)
    
    Out[124]:
    Counter({350000000: 1,
             330000000: 1,
             320000000: 1,
             80000000: 1,
             530000000: 1,
             450000000: 1,
             120000000: 2,
             220000000: 1,
             170000000: 1,
             70000000: 2,
             40000000: 1})

    The maximum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $530 Million with a Opening Weekend of $21 Million to $90 Million.

    In [197]:
    max(profit_a_two)
    
    Out[197]:
    530000000

    The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $40 Million with a Opening Weekend of $21 Million to $90 Million.

    In [198]:
    min(profit_a_two)
    
    Out[198]:
    40000000
    In [201]:
    #50,000,000-100,000,000 (#5)(24%)
    for i in group_two:
        if 40000000 <= i<= 80000000:print(i)
    
    66050951
    71633833
    40506120
    
    In [202]:
    #50,000,000-100,000,000 (#5)(24%)
    for i in group_two:
        if 80000000 <= i<= 200000000:print(i)
    
    82112435
    120587063
    167618160
    117033509
    
    In [203]:
    #50,000,000-100,000,000 (#2)(10%)
    for i in group_two:
        if 200000001 <= i<=400000000:print(i)
    
    349948323
    326398492
    316350619
    217276928
    
    In [204]:
    #50,000,000-100,000,000 (#2)(10%)
    for i in group_two:
        if 400000001 <= i<=500000000:print(i)
    
    447351353
    
    In [205]:
    #50,000,000-100,000,000 (#7)(33%)
    for i in group_two:
        if 500000001 <= i:print(i)
    
    530998101
    

    Getting the index of all the movies that are in the Drama Genre that where realesed in Summer and Autumn, from the 'df_season' dataframe.

    In [125]:
    cluster_b_index = []
    for i,x in enumerate(df_season.cluster):
        if x == 1:cluster_b_index.append(i)
    print(cluster_b_index)#showing the cluster_b_index list
    
    [1, 5, 8, 9, 10, 11, 13, 14, 15, 18, 20, 22, 24, 25, 26, 28, 30, 31, 33, 34, 37, 38, 39, 41, 42, 45, 48, 49, 51, 54, 55, 56, 57, 58, 60, 62, 63, 65, 66, 68, 71, 72, 73, 76, 77, 81, 82, 86, 87, 89, 90, 91, 95, 97, 98, 103, 105, 106, 107, 108, 109, 110, 113, 114, 119, 120, 123, 124, 125, 127, 129, 130, 131, 133, 134, 135, 136, 137, 140, 146, 150, 151, 154, 158, 159, 160, 167, 168, 169, 171, 172, 173, 177, 181, 182, 187, 188, 189, 190, 193, 196, 199, 201, 204, 206, 208, 209, 210, 212, 213, 214, 215, 216, 217, 221, 222, 223]
    

    Checking the number of elements in the 'cluster_b_index' list.

    In [127]:
    len(cluster_b_index)
    
    Out[127]:
    117

    Using the indexes from the 'cluster_b_index' list to get the Season, Profit and Opening Weekend of each movie that was realesed in Summer and Autumn.

    In [126]:
    season_b = []
    profit_b = []
    open_b = []
    for i in cluster_b_index:
        season_b.append(df_season['Season'][i])
        profit_b.append(df_season['Profit'][i])
        open_b.append(df_season['Opening_Weekend'][i])
    

    Showing the 'season_b' list.

    In [128]:
    print(season_b)
    
    [4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 3, 4, 4, 4, 4, 3, 4, 4, 3, 4, 3, 4, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 3, 4, 3, 3, 4, 3, 4, 4, 3, 4, 4, 3, 3, 4, 4, 4, 3, 4, 3, 3, 3, 4, 4, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 4, 3, 4, 4, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4, 4]
    

    Showing the 'open_b' list.

    In [134]:
    print(open_b)
    
    [37513109, 13143310, 736311, 24900566, 10470145, 492648, 19497324, 9700000, 5100000, 118298, 2002165, 253510, 13575172, 257174, 256498, 7485546, 52041, 387618, 561906, 135388, 84797, 156833, 1767308, 18623, 100268, 137651, 104030, 170335, 13307125, 2337594, 287081, 11364505, 19152401, 27547866, 11351389, 1203011, 0, 11351389, 27547866, 8146533, 5268764, 9178233, 13616196, 9421369, 6836036, 24517121, 20584908, 1528982, 2739680, 298277, 2189966, 89054, 2534729, 518795, 12146143, 2914486, 162146, 10028065, 7810481, 0, 21037414, 8742545, 11457353, 220297, 1586753, 0, 0, 0, 0, 55785112, 22403596, 11947744, 35574710, 220522, 320690, 24074047, 12381585, 15371203, 29632823, 11731703, 10003827, 26044590, 12305016, 18723269, 4765838, 105005, 5079566, 76244, 228359, 5467084, 187281, 15679190, 14065500, 4750894, 21688103, 9112839, 20321, 128140, 77740, 0, 85709, 63918, 100316, 100316, 649423, 11014818, 63918, 25775847, 0, 11166687, 31665, 245398, 0, 85709, 63918, 130303, 0]
    

    Getting the maximum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.

    In [132]:
    max(open_b)
    
    Out[132]:
    55785112

    Getting the minimum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.

    In [133]:
    min(open_b)
    
    Out[133]:
    0

    Showing the 'profit_b' list.

    In [135]:
    print(profit_b)
    
    [307567189, 19966854, 13147416, 129558438, 54735925, 9898681, 17017873, 26604054, 8270399, 23262783, 23830713, 31043521, 60133905, 12417298, 69233867, 12499242, 222016, 53273049, 17033227, 35669037, 14131551, 9295324, 8153415, 4328516, 19282640, 4438911, 2669782, 48766923, 14718173, 1500000, 2000000, 47784, 59068724, 284604712, 70975239, 4609597, 36918287, 70986904, 285937718, 33102988, 4344615, 6741732, 74830111, 34605762, 32973297, 120036382, 81120329, 48954968, 5164458, 31440294, 12815212, 150297525, 7423752, 544368315, 42892670, 11587135, 418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 58985708, 7657973, 941214868, 267142000, 23794409, 52500000, 5850377, 583698673, 77551594, 35552675, 163591522, 58660270, 22004627, 156127894, 4478084, 122498338, 136567581, 132552290, 15059418, 188120004, 41540205, 188265198, 2281732, 57086711, 44168692, 20044909, 20069303, 11477345, 67356170, 51076141, 72831866, 10369708, 143806510, 33185884, 4152584, 3478400, 1927779, 8404, 18912216, 52091915, 15465835, 15390895, 1315026, 201120004, 50167430, 2311944, 2548651, 16283563, 3664240, 1038916, 8000000, 18912216, 50167430, 3546453, 958404]
    

    Getting the maximum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.

    In [136]:
    max(profit_b)
    
    Out[136]:
    941214868

    Getting the minimum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.

    In [137]:
    min(profit_b)
    
    Out[137]:
    8404

    Showing the Frequency of the Repeated Seasons of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn . Which will be stored in a dictionary called 'grouped_season_b'.

    In [131]:
    grouped_season_b= Counter(season_b)
    print(grouped_season_b)#showing the grouped_season_a list
    
    Counter({4: 78, 3: 39})
    

    Using the 'round_to_multiple' function to round the Opening Weekend of Drama movies to the nearest 10 Million, that were realesed in Summer and Autumn.

    In [138]:
    open1_b = []
    for i in open_b:open1_b.append(round_to_multiple(i,10000000))
    print(open1_b)#showing the open1_b list
    
    [40000000, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 10000000, 0, 0, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 20000000, 30000000, 10000000, 0, 0, 10000000, 30000000, 10000000, 10000000, 10000000, 10000000, 10000000, 10000000, 20000000, 20000000, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 0, 0, 0, 60000000, 20000000, 10000000, 40000000, 0, 0, 20000000, 10000000, 20000000, 30000000, 10000000, 10000000, 30000000, 10000000, 20000000, 0, 0, 10000000, 0, 0, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 30000000, 0, 10000000, 0, 0, 0, 0, 0, 0, 0]
    

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn . Which will be stored in a dictionary called 'grouped_season_b'.

    In [140]:
    grouped_season_b = Counter(open1_b)
    print(grouped_season_b)#showing the grouped_season_b list
    
    Counter({0: 65, 10000000: 32, 20000000: 12, 30000000: 5, 40000000: 2, 60000000: 1})
    

    Using the 'round_to_multiple' function to round the Profit of Drama movies to the nearest 10 Million, that were realesed in Summer and Autumn.

    In [142]:
    profit1_b = []
    for i in open_b:profit1_b.append(round_to_multiple(i,10000000))
    print(profit1_b)#showing the profit1_b list
    
    [40000000, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 10000000, 0, 0, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 20000000, 30000000, 10000000, 0, 0, 10000000, 30000000, 10000000, 10000000, 10000000, 10000000, 10000000, 10000000, 20000000, 20000000, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 0, 0, 0, 60000000, 20000000, 10000000, 40000000, 0, 0, 20000000, 10000000, 20000000, 30000000, 10000000, 10000000, 30000000, 10000000, 20000000, 0, 0, 10000000, 0, 0, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 30000000, 0, 10000000, 0, 0, 0, 0, 0, 0, 0]
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn . Which will be stored in a dictionary called 'grouped_season_b'.

    In [143]:
    grouped_season_b= Counter(profit1_b)
    print(grouped_season_b)#showing the grouped_season_b list
    
    Counter({0: 65, 10000000: 32, 20000000: 12, 30000000: 5, 40000000: 2, 60000000: 1})
    

    Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Summer and Autumn amd that has a Opening Weekend of $1 Million to $10 Million..

    In [145]:
    group_one_index = []
    for i in cluster_b_index:
        if 1000000 <= df_season['Opening_Weekend'][i] <= 10000000:group_one_index.append(i) 
    print(group_one_index)#showing the group_one_index list
    
    [14, 15, 20, 28, 39, 54, 62, 68, 71, 72, 76, 77, 86, 87, 90, 95, 103, 107, 110, 119, 159, 167, 171, 181, 187]
    

    Checking the number of elements in the 'group_one_index' list.

    In [146]:
    len(group_one_index)
    
    Out[146]:
    25

    Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Summer and Autumn amd that has a Opening Weekend of $11 Million to $60 Million..

    In [148]:
    group_two_index = []
    for i in cluster_b_index:
        if 10000001 <= df_season['Opening_Weekend'][i] :group_two_index.append(i) 
    print(group_two_index)#showing the group_two_index list
    
    [1, 5, 9, 10, 13, 24, 51, 56, 57, 58, 60, 65, 66, 73, 81, 82, 98, 106, 109, 113, 127, 129, 130, 131, 135, 136, 137, 140, 146, 150, 151, 154, 158, 173, 177, 182, 208, 210, 213]
    

    Checking the number of elements in the 'group_two_index' list.

    In [149]:
    len(group_two_index)
    
    Out[149]:
    39

    Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn with a Opening Weekend of $1 Million to $11 Million.

    In [151]:
    group_one = []
    for i in group_one_index:group_one.append(df_season['Profit'][i])
    print(group_one)#showing the group_one list
    
    [26604054, 8270399, 23830713, 12499242, 8153415, 1500000, 4609597, 33102988, 4344615, 6741732, 34605762, 32973297, 48954968, 5164458, 12815212, 7423752, 11587135, 12469621, 216100000, 941214868, 2281732, 44168692, 11477345, 10369708, 33185884]
    

    Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn with a Opening Weekend of $11 Million to $60 Million.

    In [152]:
    group_two = []
    for i in group_two_index:group_two.append(df_season['Profit'][i])
    print(group_two)#showing the group_two list
    
    [307567189, 19966854, 129558438, 54735925, 17017873, 60133905, 14718173, 47784, 59068724, 284604712, 70975239, 70986904, 285937718, 74830111, 120036382, 81120329, 42892670, 43947950, 255500000, 58985708, 583698673, 77551594, 35552675, 163591522, 156127894, 4478084, 122498338, 136567581, 132552290, 15059418, 188120004, 41540205, 188265198, 51076141, 72831866, 143806510, 201120004, 2311944, 16283563]
    

    Checking the number of elements in the 'group_one' list.

    In [218]:
    len(group_one)
    
    Out[218]:
    25

    Checking the number of elements in the 'group_two' list.

    In [219]:
    len(group_two)
    
    Out[219]:
    39

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn with a Opening Weekend of $1 Million to $11 Million . Which will be stored in a dictionary called 'profit_b_one'.

    In [220]:
    profit_b_one = []
    for i in group_one:profit_b_one.append(round_to_multiple(i,50000000))
    
    collections.Counter(profit_b_one)
    
    Out[220]:
    Counter({50000000: 7, 0: 16, 200000000: 1, 950000000: 1})

    The maximum Profit of Drama Movies from the 'df_season' dataframe that were realesed in Summer and Autumn is $1 Billion with a Opening Weekend of $1 Million to $11 Million.

    In [153]:
    max(group_one)
    
    Out[153]:
    941214868

    The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Summer and Autumn is $1.5 Million with a Opening Weekend of $1 Million to $11 Million.

    In [222]:
    min(group_one)
    
    Out[222]:
    1500000
    In [224]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in group_one:
        if  10000000 <=i<=20000000:print(i)
    
    12499242
    12815212
    11587135
    12469621
    11477345
    10369708
    
    In [226]:
    #30,000,000-100,000,000 (#4)(8%)
    for i in group_one:
        if 20000001 <= i<=50000000:print(i)
    
    26604054
    23830713
    33102988
    34605762
    32973297
    48954968
    44168692
    33185884
    
    In [229]:
    #30,000,000-100,000,000 (#11)(21%)
    for i in group_one:
        if 50000001 <=i:print(i)
    
    216100000
    941214868
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn with a Opening Weekend of $11 Million to $60 Million . Which will be stored in a dictionary called 'profit_b_two'.

    In [230]:
    profit_b_two = []
    for i in group_two:profit_b_two.append(round_to_multiple(i,50000000))
    
    collections.Counter(profit_b_two)
    
    Out[230]:
    Counter({300000000: 3,
             0: 8,
             150000000: 6,
             50000000: 13,
             100000000: 4,
             250000000: 1,
             600000000: 1,
             200000000: 3})

    The maximum Profit of Drama Movies from the 'df_season' dataframe that were realesed in Summer and Autumn is $600 Million with a Opening Weekend of $11 Million to $60 Million.

    In [231]:
    max(group_two)
    
    Out[231]:
    583698673

    The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Summer and Autumn is $50,000 with a Opening Weekend of $11 Million to $60 Million.

    In [232]:
    min(group_two)
    
    Out[232]:
    47784
    In [233]:
    #50,000,000-100,000,000 (#8)(33%)
    for i in group_two:
        if 0 <= i<=1000000:print(i)
    
    47784
    
    In [235]:
    #50,000,000-100,000,000 (#7)(30%)
    for i in group_two:
        if 1000001<= i<=10000000:print(i)
    
    4478084
    2311944
    
    In [237]:
    #50,000,000-100,000,000 (#7)(30%)
    for i in group_two:
        if 10000001 <= i<=100000000:print(i)
    
    19966854
    54735925
    17017873
    60133905
    14718173
    59068724
    70975239
    70986904
    74830111
    81120329
    42892670
    43947950
    58985708
    77551594
    35552675
    15059418
    41540205
    51076141
    72831866
    16283563
    
    In [238]:
    #50,000,000-100,000,000 (#7)(30%)
    for i in group_two:
        if 100000001<= i<=200000000:print(i)
    
    129558438
    120036382
    163591522
    156127894
    122498338
    136567581
    132552290
    188120004
    188265198
    143806510
    
    In [239]:
    #50,000,000-100,000,000 (#7)(30%)
    for i in group_two:
        if 200000001 <= i<=400000000:print(i)
    
    307567189
    284604712
    285937718
    255500000
    201120004
    
    In [240]:
    #50,000,000-100,000,000 (#3)(13%)
    for i in group_two:
        if 400000001 <= i<=600000000:print(i)
    
    583698673
    

    Creating the df_4D dataframe.

    In [155]:
    df_4D = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
                       'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
                       "Month_Realesed":r_month+pg_month+g_month+pg13_month+nc17_month,
                       "Opening_Weekend":r_opening_weekend+pg_opening_weekend
                       +g_opening_weekend+pg13_opening_weekend+nc17_opening_weekend})
    

    The 'df_4D' dataframe. (this dataframe is interactive)

    In [156]:
    df_4D
    
    Out[156]:
    Budget Season Month_Realesed Opening_Weekend
    Loading... (need help?)

    Creating a 4D scatter plot of the Budget, Season, Month Realesed and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 4d scatter plot animate object

    In [442]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    
    #fig = plt.figure(figsize=(5, 5))
    fig = plt.figure()
    #ax = Axes3D(fig)
    #fig, ax = plt.subplots()
    ax = Axes3D(fig)
    #ax = fig.add_subplot(1, 2, 1, projection='3d')
    #fig.subplots_adjust(left=0.125, projection='3d') 
    #fig.subplots_adjust(bottom = 0.1)
    #fig.subplots_adjust(top = 0.9)
    #fig.subplots_adjust(right = 0.9)
    
    #fig = plt.figure(figsize=(6,4))
    #ax = Axes3D(fig)
    x = df['Budget']
    y = df['Season']
    z = df['Month_Realesed']
    c =  df['Opening_Weekend']
    cluster = ax.scatter(x , y, z, c=c, alpha=0.5,s=50,cmap='Reds_r')
    
    
    cluster = ax.set_xlabel('Budget')
    cluster = ax.set_ylabel('Season')
    cluster = ax.set_zlabel('Month_Realesed')
    fig.colorbar(plt.cm.ScalarMappable(cmap = 'Reds_r'), ax = ax, aspect = 5, shrink = 0.5)
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1250563926.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[442]:
    <matplotlib.animation.FuncAnimation at 0x2bfc24fb1f0>

    Saving the animated 4D scatter plot gif as 'drama6.gif'.

    In [443]:
    writergif = animation.PillowWriter(fps=30)
    ani.save('drama6.gif', fps=10 )
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The first 4D Scatter Plot (part A): the x-axis is the 'Budget', the y-axis is the 'Seson', the z-axis is the 'Month Realesed' and the c-axis is the 'Opening Weekend'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons, the Opening Weekend will be analyzed based on the budegt of the movies.

    Creating the cluster list.

    In [158]:
    y_predicted = []
    for i in df_4D["Season"]:
        if i == 1:y_predicted.append(0)
        if i == 2:y_predicted.append(1)
        if i == 3:y_predicted.append(2)
        if i == 4:y_predicted.append(3)
    

    Adding the cluster list to the 'df_4D' dataframe.

    In [446]:
    df_4D['cluster'] = y_predicted
    

    The updated 'df_4D' dataframe. (this dataframe is interactive)

    In [159]:
    df_4D
    
    Out[159]:
    Budget Season Month_Realesed Opening_Weekend
    Loading... (need help?)

    Creating a 4D scatter plot of the Budget, Season, Month Realesed and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 4d scatter plot animate object

    In [447]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    
    fig = plt.figure()
    #fig = plt.figure(figsize=(4, 15))
    #fig = plt.figure()
    
    ax = Axes3D(fig)
    
    df1 = df[df.cluster==0]
    df2 = df[df.cluster==1]
    df3 = df[df.cluster==2]
    df4 = df[df.cluster==3]
    
    x1 = df1['Budget']
    y1 = df1['Season']
    z1 = df1['Month_Realesed']
    c1 =  df1['Opening_Weekend']
    
    x2 = df2['Budget']
    y2 = df2['Season']
    z2 = df2['Month_Realesed']
    c2 =  df2['Opening_Weekend']
    
    x3 = df3['Budget']
    y3 = df3['Season']
    z3 = df3['Month_Realesed']
    c3 =  df3['Opening_Weekend']
    
    x4 = df4['Budget']
    y4 = df4['Season']
    z4 = df4['Month_Realesed']
    c4 =  df4['Opening_Weekend']
    
    #ax1 = fig.add_subplot(131, projection='3d')
    scatter = ax.scatter(x1,y1,z1, alpha=0.5,s=50, color = '#C40233')
    scatter = ax.scatter(x2,y2,z2, alpha=0.5,s=50, color = 'red')
    scatter = ax.scatter(x3,y3,z3, alpha=0.5,s=50, color = '#F400A1')
    scatter = ax.scatter(x4,y4,z4, alpha=0.5,s=50, color = 'purple')
    
    
    scatter = ax.set_xlabel('Budget')
    scatter = ax.set_ylabel('Season')
    scatter = ax.set_zlabel('Month_Realesed')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1351730319.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[447]:
    <matplotlib.animation.FuncAnimation at 0x2bfc38f8250>

    Saving the animated 4D scatter plot gif as 'drama7.gif'.

    In [448]:
    writergif = animation.PillowWriter(fps=30)
    ani.save('drama7.gif', fps=10 )
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The first 4D Scatter Plot (part B): the x-axis is the 'Budget', the y-axis is the 'Seson', the z-axis is the 'Month Realesed' and the c-axis is the 'Opening Weekend'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons which are Winter, Spring, Summer and Autumn,, the Opening Weekend will be analyzed based on the budegt of the movies.

    Getting the index of all the movies that are in the Drama Genre that where realesed in Winter, from the 'df_4D' dataframe.

    In [161]:
    cluster_a_index = []
    for i,x in enumerate(df_4D.Season):
        if x == 1:cluster_a_index.append(i)
    print(cluster_a_index)#showing the cluster_a_index list
    
    [0, 3, 4, 6, 7, 16, 19, 23, 27, 29, 32, 44, 52, 53, 61, 67, 69, 70, 75, 80, 84, 85, 88, 92, 93, 94, 100, 112, 118, 121, 128, 132, 138, 141, 142, 143, 147, 149, 152, 156, 157, 163, 165, 166, 170, 176, 178, 179, 183, 185, 191, 198, 200, 202, 203, 205, 211, 219]
    

    Checking the number of elements in the 'cluster_a_index' list.

    In [254]:
    len(cluster_a_index)
    
    Out[254]:
    58

    Using the indexes from the 'cluster_a_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Winter.

    In [163]:
    season_a = []
    budget_a = []
    open_a = []
    month_a = []
    for i in cluster_a_index:
        season_a.append(df_4D['Season'][i])
        budget_a.append(df_4D['Budget'][i])
        open_a.append(df_4D['Opening_Weekend'][i])
        month_a.append(df_4D['Month_Realesed'][i])
    

    Showing the 'season_a' list.

    In [164]:
    print(season_a)
    
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    

    Showing the 'month_a' list.

    In [165]:
    print(month_a)
    
    [12, 2, 2, 12, 2, 12, 12, 1, 1, 1, 2, 12, 12, 1, 2, 12, 12, 1, 2, 1, 1, 2, 12, 12, 2, 12, 2, 12, 12, 12, 12, 12, 12, 1, 12, 12, 2, 2, 2, 2, 12, 2, 2, 12, 1, 12, 1, 1, 1, 12, 12, 2, 1, 2, 12, 12, 12, 1]
    

    Showing the 'budget_a' list.

    In [166]:
    print(budget_a)
    
    [100000000, 55000000, 55000000, 52500000, 40000000, 13000000, 12000000, 11000000, 7000000, 4900000, 3500000, 1000000, 100000, 2700000, 1700000, 40000000, 422000, 11800000, 17000000, 23000000, 16000000, 3000000, 20000000, 14000000, 15000000, 12000000, 8200000, 666000, 60000000, 17000000, 75000000, 50000000, 40000000, 37000000, 36000000, 35000000, 30000000, 28000000, 25000000, 25000000, 24000000, 16000000, 15000000, 15000000, 12000000, 9700000, 9000000, 7400000, 5000000, 5000000, 6500000, 15000000, 6500000, 15000000, 6500000, 1000000, 6500000, 1250000]
    

    Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.

    In [167]:
    max(budget_a)
    
    Out[167]:
    100000000

    Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.

    In [168]:
    min(budget_a)
    
    Out[168]:
    100000

    Showing the 'open_a' list.

    In [169]:
    print(open_a)
    
    [30122888, 46607250, 38560195, 24400000, 85171450, 1443809, 224476, 47122, 24587, 473882, 8800230, 193728, 2105729, 63356, 44542, 16755310, 0, 12177488, 22564512, 14466, 721341, 82601, 5609875, 93005, 89213, 0, 8556935, 679185, 10103675, 0, 35258, 526011, 143818, 14789393, 7102085, 24830443, 41202458, 21401594, 30468614, 13002632, 129462, 8089139, 30452, 30452, 8310232, 68266, 6213362, 13501349, 212000, 53778, 361000, 143632, 361000, 142632, 361000, 193728, 361000, 100000]
    

    Getting the maximum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.

    In [170]:
    max(open_a)
    
    Out[170]:
    85171450

    Getting the minimum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.

    In [171]:
    min(open_a)
    
    Out[171]:
    0

    Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter .

    In [208]:
    print(Counter(month_a))
    
    Counter({12: 27, 2: 17, 1: 14})
    

    Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter .

    In [172]:
    print(Counter(budget_a))
    
    Counter({15000000: 5, 6500000: 4, 40000000: 3, 12000000: 3, 55000000: 2, 1000000: 2, 17000000: 2, 16000000: 2, 25000000: 2, 5000000: 2, 100000000: 1, 52500000: 1, 13000000: 1, 11000000: 1, 7000000: 1, 4900000: 1, 3500000: 1, 100000: 1, 2700000: 1, 1700000: 1, 422000: 1, 11800000: 1, 23000000: 1, 3000000: 1, 20000000: 1, 14000000: 1, 8200000: 1, 666000: 1, 60000000: 1, 75000000: 1, 50000000: 1, 37000000: 1, 36000000: 1, 35000000: 1, 30000000: 1, 28000000: 1, 24000000: 1, 9700000: 1, 9000000: 1, 7400000: 1, 1250000: 1})
    

    Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 1 Million, that were realesed in Winter. .

    In [220]:
    budg_a_one = []
    for i in group_one_index:budg_a_one.append(round_to_multiple(df_4D['Budget'][i],1000000))
    print(budg_a_one)#showing the budg_a_one list
    
    [20000000, 20000000, 12000000, 5000000, 2000000, 12000000, 5000000, 5000000, 15000000, 32000000, 30000000, 0, 15000000, 10000000, 5000000, 8000000, 7000000, 15000000, 30000000, 45000000, 20000000, 15000000, 12000000, 6000000, 2000000]
    

    Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter .

    In [221]:
    print(Counter(budg_a_one))
    
    Counter({5000000: 4, 15000000: 4, 20000000: 3, 12000000: 3, 2000000: 2, 30000000: 2, 32000000: 1, 0: 1, 10000000: 1, 8000000: 1, 7000000: 1, 45000000: 1, 6000000: 1})
    

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Winter amd that has a Budget of $100,000 to $20 Million..

    In [239]:
    group_one_index = []
    for i in cluster_a_index:
        if 0 <= df_4D['Budget'][i] <= 20000000:group_one_index.append(i)
    print(group_one_index)#showing the group_one_index list
    
    [16, 19, 23, 27, 29, 32, 44, 52, 53, 61, 69, 70, 75, 84, 85, 88, 92, 93, 94, 100, 112, 121, 163, 165, 166, 170, 176, 178, 179, 183, 185, 191, 198, 200, 202, 203, 205, 211, 219]
    

    Checking the number of elements in the 'group_one_index' list.

    In [240]:
    len(group_one_index)
    
    Out[240]:
    39

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Winter amd that has a Budget of $21 Million to $100 Million..

    In [242]:
    group_two_index = []
    for i in cluster_a_index:
        if 20000001 <= df_4D['Budget'][i] :group_two_index.append(i)  
    print(group_two_index)#showing the group_two_index list
    
    [0, 3, 4, 6, 7, 67, 80, 118, 128, 132, 138, 141, 142, 143, 147, 149, 152, 156, 157]
    

    Checking the number of elements in the 'group_one_index' list.

    In [266]:
    len(group_two_index)
    
    Out[266]:
    19

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Winter with a Budget of $100,000 to $20 Million.

    In [257]:
    open_a_one = []
    for i in group_one_index:open_a_one.append(df_4D['Opening_Weekend'][i])
    print(open_a_one)#showing the open_a_one list
    
    [1443809, 224476, 47122, 24587, 473882, 8800230, 193728, 2105729, 63356, 44542, 0, 12177488, 22564512, 721341, 82601, 5609875, 93005, 89213, 0, 8556935, 679185, 0, 8089139, 30452, 30452, 8310232, 68266, 6213362, 13501349, 212000, 53778, 361000, 143632, 361000, 142632, 361000, 193728, 361000, 100000]
    

    Checking the number of elements in the 'open_a_one' list.

    In [258]:
    len(open_a_one)
    
    Out[258]:
    39

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Winter with a Budget of $21 Million to $100 Million.

    In [263]:
    open_a_two = []
    for i in group_two_index:open_a_two.append(df_4D['Opening_Weekend'][i])
    print(open_a_two)#showing the open_a_two list
    
    [30122888, 46607250, 38560195, 24400000, 85171450, 16755310, 14466, 10103675, 35258, 526011, 143818, 14789393, 7102085, 24830443, 41202458, 21401594, 30468614, 13002632, 129462]
    

    Checking the number of elements in the 'open_a_two' list.

    In [264]:
    len(open_a_two)
    
    Out[264]:
    19

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter with a Budget of $100,000 to $20 Million . Which will be stored in a dictionary called 'open_a_one'.

    In [256]:
    open_a_one1 = []
    for i in group_one_index:open_a_one1.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
    
    Counter(open_a_one1)
    
    Out[256]:
    Counter({1000000: 3,
             0: 26,
             9000000: 2,
             2000000: 1,
             12000000: 1,
             23000000: 1,
             6000000: 2,
             8000000: 2,
             14000000: 1})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Winter is $24 Million with a Budget of $100,000 to $20 Million.

    In [261]:
    max(open_a_one)
    
    Out[261]:
    22564512

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realses in Winter is $0 with a Budget of $100,000 to $20 Million.

    In [262]:
    min(open_a_one)
    
    Out[262]:
    0
    In [274]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_one:
        if 0 <= i <=5000000:print(i)
    
    1443809
    224476
    47122
    24587
    473882
    193728
    2105729
    63356
    44542
    0
    721341
    82601
    93005
    89213
    0
    679185
    0
    30452
    30452
    68266
    212000
    53778
    361000
    143632
    361000
    142632
    361000
    193728
    361000
    100000
    
    In [275]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_one:
        if 5000001 <= i <=10000000:print(i)
    
    8800230
    5609875
    8556935
    8089139
    8310232
    6213362
    
    In [276]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_one:
        if 10000001 <= i <=15000000:print(i)
    
    12177488
    13501349
    
    In [278]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_one:
        if 20000001 <= i :print(i)
    
    22564512
    

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter with a Budget of $21 Million to $100 Million . Which will be stored in a dictionary called 'open_a_two'.

    In [255]:
    open_a_two = []
    for i in group_two_index:open_a_two.append(round_to_multiple(df_4D['Opening_Weekend'][i],10000000))
    
    Counter(open_a_two)
    
    Out[255]:
    Counter({30000000: 2,
             50000000: 1,
             40000000: 2,
             20000000: 4,
             90000000: 1,
             0: 5,
             10000000: 4})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Winter is $90 Million with a Budget of $21 Million to $100 Million.

    In [265]:
    max(open_a_two)
    
    Out[265]:
    85171450

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realses in Winter is $15,000 with a Budget of $21 Million to $100 Million.

    In [266]:
    min(open_a_two)
    
    Out[266]:
    14466
    In [282]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_two:
        if 0 <= i <=20000000:print(i)
    
    16755310
    14466
    10103675
    35258
    526011
    143818
    14789393
    7102085
    13002632
    129462
    
    In [283]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_two:
        if 20000001 <= i <=40000000:print(i)
    
    30122888
    38560195
    24400000
    24830443
    21401594
    30468614
    
    In [284]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_two:
        if 40000001 <= i <=60000000:print(i)
    
    46607250
    41202458
    
    In [286]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_a_two:
        if 80000001 <= i <=100000000:print(i)
    
    85171450
    

    Getting the index of all the movies that are in the Drama Genre that where realesed in Spring, from the 'df_4D' dataframe.

    In [175]:
    cluster_b_index = []
    for i,x in enumerate(df_4D.Season):
        if x == 2:cluster_b_index.append(i)
    print(cluster_b_index)#showing the cluster_b_index list
    
    [2, 12, 17, 21, 35, 36, 40, 43, 46, 47, 50, 59, 64, 74, 78, 79, 83, 96, 99, 101, 102, 104, 111, 115, 116, 117, 122, 126, 139, 144, 145, 148, 153, 155, 161, 162, 164, 174, 175, 180, 184, 186, 192, 194, 195, 197, 207, 218, 220, 224]
    

    Checking the number of elements in the 'cluster_b_index' list.

    In [288]:
    len(cluster_b_index)
    
    Out[288]:
    50

    Using the indexes from the 'cluster_b_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Spring.

    In [179]:
    season_b = []
    budget_b = []
    open_b = []
    month_b = []
    for i in cluster_b_index:
        season_b.append(df_4D['Season'][i])
        budget_b.append(df_4D['Budget'][i])
        open_b.append(df_4D['Opening_Weekend'][i])
        month_b.append(df_4D['Month_Realesed'][i])
    

    Showing the 'season_b' list.

    In [180]:
    print(season_b)
    
    [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
    

    Showing the 'month_b' list.

    In [181]:
    print(month_b)
    
    [5, 4, 4, 3, 5, 3, 4, 3, 5, 4, 5, 3, 3, 4, 3, 3, 5, 5, 3, 5, 4, 3, 5, 3, 4, 4, 3, 5, 4, 5, 4, 4, 4, 4, 4, 3, 3, 5, 4, 4, 3, 5, 3, 4, 4, 3, 4, 5, 3, 4]
    

    Showing the 'budget_b' list.

    In [182]:
    print(budget_b)
    
    [60000000, 22500000, 13000000, 12000000, 3000000, 2000000, 2000000, 1500000, 1000000, 135000, 8500000, 20000000, 95000000, 8000000, 20000000, 2000000, 10000000, 17000000, 4500000, 28000000, 700000, 22000000, 2500000, 22000000, 18000000, 8200000, 10000000, 1700000, 38000000, 35000000, 34000000, 30000000, 25000000, 25000000, 17000000, 17000000, 16000000, 10000000, 10000000, 7000000, 5000000, 2600000, 12500000, 20000, 955472, 9000000, 3565572, 6500000, 12000, 612072]
    

    Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.

    In [184]:
    max(budget_b)
    
    Out[184]:
    95000000

    Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.

    In [185]:
    min(budget_b)
    
    Out[185]:
    12000

    Showing the 'open_a' list.

    In [186]:
    print(open_a)
    
    [30122888, 46607250, 38560195, 24400000, 85171450, 1443809, 224476, 47122, 24587, 473882, 8800230, 193728, 2105729, 63356, 44542, 16755310, 0, 12177488, 22564512, 14466, 721341, 82601, 5609875, 93005, 89213, 0, 8556935, 679185, 10103675, 0, 35258, 526011, 143818, 14789393, 7102085, 24830443, 41202458, 21401594, 30468614, 13002632, 129462, 8089139, 30452, 30452, 8310232, 68266, 6213362, 13501349, 212000, 53778, 361000, 143632, 361000, 142632, 361000, 193728, 361000, 100000]
    

    Getting the maximum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.

    In [187]:
    max(open_a)
    
    Out[187]:
    85171450

    Getting the minimum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.

    In [188]:
    min(open_a)
    
    Out[188]:
    0

    Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring .

    In [209]:
    print(Counter(month_b))
    
    Counter({4: 20, 3: 17, 5: 13})
    

    Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring .

    In [210]:
    print(Counter(budget_b))
    
    Counter({10000000: 4, 2000000: 3, 17000000: 3, 20000000: 2, 22000000: 2, 25000000: 2, 60000000: 1, 22500000: 1, 13000000: 1, 12000000: 1, 3000000: 1, 1500000: 1, 1000000: 1, 135000: 1, 8500000: 1, 95000000: 1, 8000000: 1, 4500000: 1, 28000000: 1, 700000: 1, 2500000: 1, 18000000: 1, 8200000: 1, 1700000: 1, 38000000: 1, 35000000: 1, 34000000: 1, 30000000: 1, 16000000: 1, 7000000: 1, 5000000: 1, 2600000: 1, 12500000: 1, 20000: 1, 955472: 1, 9000000: 1, 3565572: 1, 6500000: 1, 12000: 1, 612072: 1})
    

    Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 5 Million, that were realesed in Spring .

    In [228]:
    bud_b_one = []
    for i in cluster_b_index:bud_b_one.append(round_to_multiple(df_4D['Budget'][i],5000000))
    print(bud_b_one)#showing the bud_b_one list
    
    [60000000, 20000000, 15000000, 10000000, 5000000, 0, 0, 0, 0, 0, 10000000, 20000000, 95000000, 10000000, 20000000, 0, 10000000, 15000000, 5000000, 30000000, 0, 20000000, 0, 20000000, 20000000, 10000000, 10000000, 0, 40000000, 35000000, 35000000, 30000000, 25000000, 25000000, 15000000, 15000000, 15000000, 10000000, 10000000, 5000000, 5000000, 5000000, 10000000, 0, 0, 10000000, 5000000, 5000000, 0, 0]
    

    Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Spring .

    In [229]:
    print(Counter(bud_b_one))
    
    Counter({0: 13, 10000000: 10, 5000000: 7, 20000000: 6, 15000000: 5, 30000000: 2, 35000000: 2, 25000000: 2, 60000000: 1, 95000000: 1, 40000000: 1})
    

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Spring amd that has a Budget of $12,000 to $20 Million..

    In [313]:
    group_one_index = []
    for i in cluster_b_index:
        if 0 <= df_4D['Budget'][i] <= 20000000:group_one_index.append(i)  
    print(group_one_index)#showing the group_one_index list
    
    [17, 21, 35, 36, 40, 43, 46, 47, 50, 59, 74, 78, 79, 83, 96, 99, 102, 111, 116, 117, 122, 126, 161, 162, 164, 174, 175, 180, 184, 186, 192, 194, 195, 197, 207, 218, 220, 224]
    

    Checking the number of elements in the 'group_one_index' list.

    In [276]:
    len(group_one_index)
    
    Out[276]:
    38

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Spring amd that has a Budget of $21 Million to $95 Million..

    In [270]:
    group_two_index = []
    for i in cluster_b_index:
        if 20000001 <= df_4D['Budget'][i] <= 100000000:group_two_index.append(i)  
    print(group_two_index)#showing the group_two_index list
    
    [2, 12, 64, 101, 104, 115, 139, 144, 145, 148, 153, 155]
    

    Checking the number of elements in the 'group_two_index' list.

    In [277]:
    len(group_two_index)
    
    Out[277]:
    12

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Spring with a Budget of $12,000 to $20 Million.

    In [271]:
    open_b_one = []
    for i in group_one_index:open_b_one.append(df_4D['Opening_Weekend'][i])
    print(open_b_one)#showing the open_b_one list
    
    [237264, 160547, 246914, 6661234, 81006, 3762145, 63461, 36134, 118150, 16007426, 6011585, 16007426, 9244641, 124011, 16015408, 46977, 0, 0, 4625583, 0, 0, 0, 9851102, 15002635, 20874072, 11727390, 2215891, 446380, 4690214, 55438, 69100, 0, 0, 738339, 24286, 738339, 70188, 0]
    

    Checking the number of elements in the 'open_b_one' list.

    In [304]:
    len(open_b_one)
    
    Out[304]:
    38

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Spring with a Budget of $21 Million to $95 Million.

    In [280]:
    open_b_two = []
    for i in group_two_index:open_b_two.append(df_4D['Opening_Weekend'][i])
    print(open_b_two)#showing the open_b_two list
    
    [14953664, 1220335, 67877361, 5088381, 16021684, 16021684, 16842353, 372920, 13019686, 13203458, 22618358, 9783603]
    

    Checking the number of elements in the 'open_b_two' list.

    In [273]:
    len(open_b_two)
    
    Out[273]:
    12

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring with a Budget of $12,000 to $20 Million . Which will be stored in a dictionary called 'open_b_one'.

    In [317]:
    open_b_one = []
    for i in group_one_index:open_b_one.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
    
    Counter(open_b_one)
    
    Out[317]:
    Counter({0: 22,
             7000000: 1,
             4000000: 1,
             16000000: 3,
             6000000: 1,
             9000000: 1,
             5000000: 2,
             10000000: 1,
             15000000: 1,
             21000000: 1,
             12000000: 1,
             2000000: 1,
             1000000: 2})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $30 Million with a Budget of $12,000 to $20 Million.

    In [274]:
    max(open_b_one)
    
    Out[274]:
    20874072

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $0 with a Budget of $12,000 to $20 Million.

    In [275]:
    min(open_b_one)
    
    Out[275]:
    0
    In [308]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_one:
        if 0 <= i <=5000000:print(i)
    
    237264
    160547
    246914
    81006
    3762145
    63461
    36134
    118150
    124011
    46977
    0
    0
    4625583
    0
    0
    0
    2215891
    446380
    4690214
    55438
    69100
    0
    0
    738339
    24286
    738339
    70188
    0
    
    In [309]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_one:
        if 5000001 <= i <=10000000:print(i)
    
    6661234
    6011585
    9244641
    9851102
    
    In [310]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_one:
        if 10000001 <= i <=15000000:print(i)
    
    11727390
    
    In [311]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_one:
        if 15000001 <= i <=20000000:print(i)
    
    16007426
    16007426
    16015408
    15002635
    
    In [312]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_one:
        if 20000001 <= i :print(i)
    
    20874072
    

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring with a Budget of $21 Million to $95 Million . Which will be stored in a dictionary called 'open_b_two'.

    In [279]:
    open_b_two = []
    for i in group_two_index:open_b_two.append(round_to_multiple(df_4D['Opening_Weekend'][i],10000000))
    
    Counter(open_b_two)
    
    Out[279]:
    Counter({10000000: 5, 0: 2, 70000000: 1, 20000000: 4})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $70 Million with a Budget of $21 Million to $95 Million.

    In [281]:
    max(open_b_two)
    
    Out[281]:
    67877361

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $400,000 with a Budget of $21 Million to $95 Million.

    In [306]:
    min(open_b_two)
    
    Out[306]:
    372920
    In [314]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_two:
        if 0 <= i <=20000000:print(i)
    
    10000000
    0
    10000000
    20000000
    20000000
    20000000
    0
    10000000
    10000000
    20000000
    10000000
    
    In [318]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_b_two:
        if 20000001 <= i :print(i)
    
    70000000
    

    Getting the index of all the movies that are in the Drama Genre that where realesed in Summer, from the 'df_4D' dataframe.

    In [190]:
    cluster_c_index = []
    for i,x in enumerate(df_4D.Season):
        if x == 3:cluster_c_index.append(i)
    print(cluster_c_index)#showing the cluster_c_index list
    
    [14, 24, 31, 37, 39, 48, 51, 60, 63, 65, 68, 73, 81, 82, 90, 95, 97, 98, 106, 107, 109, 110, 114, 119, 120, 136, 146, 151, 154, 158, 167, 171, 172, 182, 190, 212, 213, 216, 217]
    

    Checking the number of elements in the 'cluster_c_index' list.

    In [320]:
    len(cluster_c_index)
    
    Out[320]:
    39

    Using the indexes from the 'cluster_c_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Summer.

    In [191]:
    season_c = []
    budget_c = []
    open_c = []
    month_c = []
    for i in cluster_c_index:
        season_c.append(df_4D['Season'][i])
        budget_c.append(df_4D['Budget'][i])
        open_c.append(df_4D['Opening_Weekend'][i])
        month_c.append(df_4D['Month_Realesed'][i])
    

    Showing the 'season_c' list.

    In [192]:
    print(season_c)
    
    [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
    

    Showing the 'month_c' list.

    In [193]:
    print(month_c)
    
    [8, 6, 7, 6, 8, 7, 7, 8, 6, 8, 8, 6, 7, 7, 7, 6, 7, 7, 8, 7, 6, 8, 7, 6, 8, 7, 7, 8, 8, 6, 8, 7, 8, 7, 7, 7, 8, 6, 7]
    

    Showing the 'budget_c' list.

    In [194]:
    print(budget_c)
    
    [20000000, 10000000, 4000000, 2000000, 2000000, 100000, 20000000, 3000000, 10000000, 3000000, 5000000, 40000000, 32000000, 90000000, 5000000, 7500000, 5000000, 22000000, 23000000, 15000000, 70000000, 30000000, 10000000, 45000000, 858000, 44000000, 33000000, 25000000, 25000000, 20000000, 15000000, 12000000, 11000000, 5000000, 175000, 904765, 34000000, 1000000, 1500000]
    

    Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.

    In [195]:
    max(budget_c)
    
    Out[195]:
    90000000

    Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.

    In [196]:
    min(budget_c)
    
    Out[196]:
    100000

    Showing the 'open_c' list.

    In [197]:
    print(open_c)
    
    [9700000, 13575172, 387618, 84797, 1767308, 104030, 13307125, 11351389, 0, 11351389, 8146533, 13616196, 24517121, 20584908, 2189966, 2534729, 518795, 12146143, 10028065, 7810481, 21037414, 8742545, 220297, 1586753, 0, 12381585, 11731703, 26044590, 12305016, 18723269, 5079566, 5467084, 187281, 21688103, 77740, 0, 11166687, 0, 85709]
    

    Getting the maximum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.

    In [198]:
    max(open_c)
    
    Out[198]:
    26044590

    Getting the minimum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.

    In [199]:
    min(open_c)
    
    Out[199]:
    0

    Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer .

    In [211]:
    print(Counter(month_c))
    
    Counter({7: 17, 8: 13, 6: 9})
    

    Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer .

    In [212]:
    print(Counter(budget_c))
    
    Counter({5000000: 4, 20000000: 3, 10000000: 3, 2000000: 2, 3000000: 2, 15000000: 2, 25000000: 2, 4000000: 1, 100000: 1, 40000000: 1, 32000000: 1, 90000000: 1, 7500000: 1, 22000000: 1, 23000000: 1, 70000000: 1, 30000000: 1, 45000000: 1, 858000: 1, 44000000: 1, 33000000: 1, 12000000: 1, 11000000: 1, 175000: 1, 904765: 1, 34000000: 1, 1000000: 1, 1500000: 1})
    

    Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 5 Million, that were realesed in Summer .

    In [231]:
    bud_c_one = []
    for i in cluster_c_index:bud_c_one.append(round_to_multiple(df_4D['Budget'][i],5000000))
    print(bud_c_one)#showing the bud_c_one list
    
    [20000000, 10000000, 5000000, 0, 0, 0, 20000000, 5000000, 10000000, 5000000, 5000000, 40000000, 30000000, 90000000, 5000000, 10000000, 5000000, 20000000, 25000000, 15000000, 70000000, 30000000, 10000000, 45000000, 0, 45000000, 35000000, 25000000, 25000000, 20000000, 15000000, 10000000, 10000000, 5000000, 0, 0, 35000000, 0, 0]
    

    Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Summer .

    In [232]:
    print(Counter(bud_c_one))
    
    Counter({0: 8, 5000000: 7, 10000000: 6, 20000000: 4, 25000000: 3, 30000000: 2, 15000000: 2, 45000000: 2, 35000000: 2, 40000000: 1, 90000000: 1, 70000000: 1})
    

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Summer amd that has a Budget of $100,000 to $20 Million..

    In [311]:
    group_one_index = []
    for i in cluster_c_index:
        if 0 <= df_4D['Budget'][i] <= 20000000:group_one_index.append(i) 
    print(group_one_index)#showing the group_one_index list
    
    [14, 24, 31, 37, 39, 48, 51, 60, 63, 65, 68, 90, 95, 97, 107, 114, 120, 158, 167, 171, 172, 182, 190, 212, 216, 217]
    

    Checking the number of elements in the 'group_one_index' list.

    In [288]:
    len(group_one_index)
    
    Out[288]:
    26

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Summer amd that has a Budget of $21 Million to $90 Million..

    In [284]:
    group_two_index = []
    for i in cluster_c_index:
        if 20000001 <= df_4D['Budget'][i] <= 90000000:group_two_index.append(i)  
    print(group_two_index)#showing the group_two_index list
    
    [73, 81, 82, 98, 106, 109, 110, 119, 136, 146, 151, 154, 213]
    

    Checking the number of elements in the 'group_two_index' list.

    In [289]:
    len(group_two_index)
    
    Out[289]:
    13

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Summer with a Budget of $100,000 to $20 Million.

    In [296]:
    open_c_one = []
    for i in group_one_index:open_c_one.append(df_4D['Opening_Weekend'][i])
    print(open_c_one)#showing the open_c_one list
    
    [9700000, 13575172, 387618, 84797, 1767308, 104030, 13307125, 11351389, 0, 11351389, 8146533, 2189966, 2534729, 518795, 7810481, 220297, 0, 18723269, 5079566, 5467084, 187281, 21688103, 77740, 0, 0, 85709]
    

    Checking the number of elements in the 'open_c_one' list.

    In [334]:
    len(open_c_one)
    
    Out[334]:
    26

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Summer with a Budget of $21 Million to $90 Million.

    In [287]:
    open_c_two = []
    for i in group_two_index:open_c_two.append(df_4D['Opening_Weekend'][i])
    print(open_c_two)#showing the open_c_two list
    
    [13616196, 24517121, 20584908, 12146143, 10028065, 21037414, 8742545, 1586753, 12381585, 11731703, 26044590, 12305016, 11166687]
    

    Checking the number of elements in the 'open_c_two' list.

    In [335]:
    len(open_c_two)
    
    Out[335]:
    13

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer with a Budget of $100,000 to $20 Million . Which will be stored in a dictionary called 'open_c_one'.

    In [312]:
    open_c_one = []
    for i in group_one_index:open_c_one.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
    
    Counter(open_c_one)
    
    Out[312]:
    Counter({10000000: 1,
             14000000: 1,
             0: 11,
             2000000: 2,
             13000000: 1,
             11000000: 2,
             8000000: 2,
             3000000: 1,
             1000000: 1,
             19000000: 1,
             5000000: 2,
             22000000: 1})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $22 Million with a Budget of $100,000 to $20 Million.

    In [336]:
    max(open_c_one)
    
    Out[336]:
    21688103

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $0 with a Budget of $100,000 to $20 Million.

    In [337]:
    min(open_c_one)
    
    Out[337]:
    0
    In [341]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_one:
        if 0 <= i <= 5000000:print(i)
    
    387618
    84797
    1767308
    104030
    0
    2189966
    2534729
    518795
    220297
    0
    187281
    77740
    0
    0
    85709
    
    In [342]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_one:
        if 5000001 <= i <=10000000:print(i)
    
    9700000
    8146533
    7810481
    5079566
    5467084
    
    In [343]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_one:
        if 10000001 <= i <=15000000:print(i)
    
    13575172
    13307125
    11351389
    11351389
    
    In [344]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_one:
        if 15000001 <= i <=20000000:print(i)
    
    18723269
    
    In [345]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_one:
        if 20000001 <= i :print(i)
    
    21688103
    

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer with a Budget of $21 Million to $95 Million . Which will be stored in a dictionary called 'open_c_two'.

    In [346]:
    open_c_two = []
    for i in group_two_index:open_c_two.append(round_to_multiple(df['Opening_Weekend'][i],10000000))
    
    collections.Counter(open_c_two)
    
    Out[346]:
    Counter({10000000: 8, 20000000: 3, 0: 1, 30000000: 1})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $26 Million with a Budget of $21 Million to $95 Million.

    In [293]:
    max(open_c_two)
    
    Out[293]:
    26044590

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $1 Million with a Budget of $21 Million to $95 Million.

    In [294]:
    min(open_c_two)
    
    Out[294]:
    1586753
    In [347]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_two:
        if 0 <= i <=5000000:print(i)
    
    0
    
    In [348]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_two:
        if 5000000 <= i <=10000000:print(i)
    
    10000000
    10000000
    10000000
    10000000
    10000000
    10000000
    10000000
    10000000
    
    In [350]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_two:
        if 15000001 <= i <=20000000:print(i)
    
    20000000
    20000000
    20000000
    
    In [352]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_c_two:
        if 25000001 <= i <=30000000:print(i)
    
    30000000
    

    Getting the index of all the movies that are in the Drama Genre that where realesed in Autumn, from the 'df_4D' dataframe.

    In [201]:
    cluster_d_index = []
    for i,x in enumerate(df_4D.Season):
        if x == 4:cluster_d_index.append(i)
    print(cluster_d_index)#showing the cluster_d_index list
    
    [1, 5, 8, 9, 10, 11, 13, 15, 18, 20, 22, 25, 26, 28, 30, 33, 34, 38, 41, 42, 45, 49, 54, 55, 56, 57, 58, 62, 66, 71, 72, 76, 77, 86, 87, 89, 91, 103, 105, 108, 113, 123, 124, 125, 127, 129, 130, 131, 133, 134, 135, 137, 140, 150, 159, 160, 168, 169, 173, 177, 181, 187, 188, 189, 193, 196, 199, 201, 204, 206, 208, 209, 210, 214, 215, 221, 222, 223]
    

    Checking the number of elements in the 'cluster_d_index' list.

    In [354]:
    len(cluster_d_index)
    
    Out[354]:
    78

    Using the indexes from the 'cluster_d_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Autumn.

    In [202]:
    season_d = []
    budget_d = []
    open_d = []
    month_d = []
    for i in cluster_d_index:
        season_d.append(df_4D['Season'][i])
        budget_d.append(df_4D['Budget'][i])
        open_d.append(df_4D['Opening_Weekend'][i])
        month_d.append(df_4D['Month_Realesed'][i])
    

    Showing the 'season_d' list.

    In [203]:
    print(season_d)
    
    [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
    

    Showing the 'month_d' list.

    In [204]:
    print(month_d)
    
    [10, 10, 9, 11, 10, 11, 11, 10, 10, 9, 11, 11, 11, 10, 9, 10, 10, 10, 10, 9, 10, 9, 10, 9, 11, 9, 11, 10, 11, 10, 10, 11, 9, 11, 10, 10, 9, 11, 11, 10, 10, 11, 10, 9, 10, 9, 11, 11, 10, 11, 11, 10, 11, 10, 9, 11, 10, 9, 11, 10, 9, 9, 11, 10, 10, 9, 10, 10, 10, 9, 9, 9, 10, 10, 11, 9, 10, 10]
    

    Showing the 'budget_d' list.

    In [205]:
    print(budget_d)
    
    [61000000, 55000000, 37500000, 31000000, 23000000, 22500000, 21000000, 20000000, 13000000, 12000000, 11800000, 9400000, 8500000, 5000000, 4750000, 3400000, 3300000, 2000000, 2000000, 1987650, 1000000, 6000000, 11500000, 9000000, 180000000, 37000000, 20000000, 5100000, 20000000, 15000000, 32000000, 30000000, 500000, 15000000, 10000000, 12000000, 7000000, 7000000, 20000000, 2700000, 85000000, 6400000, 13000000, 1750000, 110000000, 60000000, 55000000, 50000000, 50000000, 49000000, 47000000, 40000000, 37000000, 26000000, 20000000, 19000000, 14000000, 13000000, 11000000, 9000000, 6000000, 2000000, 1400000, 250000, 1000000, 1500000, 15000000, 4000000, 4074940, 1000000, 12000000, 15000000, 350000, 230000, 1000000, 15000000, 2200000, 50000]
    

    Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Autumn.

    In [206]:
    max(budget_d)
    
    Out[206]:
    180000000

    Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Autumn.

    In [207]:
    min(budget_d)
    
    Out[207]:
    50000

    Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn .

    In [214]:
    print(Counter(month_d))
    
    Counter({10: 35, 11: 23, 9: 20})
    

    Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn .

    In [215]:
    print(Counter(budget_d))
    
    Counter({20000000: 5, 15000000: 5, 1000000: 4, 13000000: 3, 12000000: 3, 2000000: 3, 55000000: 2, 6000000: 2, 9000000: 2, 37000000: 2, 7000000: 2, 50000000: 2, 61000000: 1, 37500000: 1, 31000000: 1, 23000000: 1, 22500000: 1, 21000000: 1, 11800000: 1, 9400000: 1, 8500000: 1, 5000000: 1, 4750000: 1, 3400000: 1, 3300000: 1, 1987650: 1, 11500000: 1, 180000000: 1, 5100000: 1, 32000000: 1, 30000000: 1, 500000: 1, 10000000: 1, 2700000: 1, 85000000: 1, 6400000: 1, 1750000: 1, 110000000: 1, 60000000: 1, 49000000: 1, 47000000: 1, 40000000: 1, 26000000: 1, 19000000: 1, 14000000: 1, 11000000: 1, 1400000: 1, 250000: 1, 1500000: 1, 4000000: 1, 4074940: 1, 350000: 1, 230000: 1, 2200000: 1, 50000: 1})
    

    Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 5 Million, that were realesed in Autumn .

    In [237]:
    bud_d_one = []
    for i in cluster_d_index:bud_d_one.append(round_to_multiple(df_4D['Budget'][i],1000000))
    print(bud_d_one)#showing the bud_d_onelist
    
    [61000000, 55000000, 38000000, 31000000, 23000000, 22000000, 21000000, 20000000, 13000000, 12000000, 12000000, 9000000, 8000000, 5000000, 5000000, 3000000, 3000000, 2000000, 2000000, 2000000, 1000000, 6000000, 12000000, 9000000, 180000000, 37000000, 20000000, 5000000, 20000000, 15000000, 32000000, 30000000, 0, 15000000, 10000000, 12000000, 7000000, 7000000, 20000000, 3000000, 85000000, 6000000, 13000000, 2000000, 110000000, 60000000, 55000000, 50000000, 50000000, 49000000, 47000000, 40000000, 37000000, 26000000, 20000000, 19000000, 14000000, 13000000, 11000000, 9000000, 6000000, 2000000, 1000000, 0, 1000000, 2000000, 15000000, 4000000, 4000000, 1000000, 12000000, 15000000, 0, 0, 1000000, 15000000, 2000000, 0]
    

    Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Autumn .

    In [238]:
    print(Counter(bud_d_one))
    
    Counter({2000000: 7, 20000000: 5, 12000000: 5, 1000000: 5, 15000000: 5, 0: 5, 13000000: 3, 9000000: 3, 5000000: 3, 3000000: 3, 6000000: 3, 55000000: 2, 37000000: 2, 7000000: 2, 50000000: 2, 4000000: 2, 61000000: 1, 38000000: 1, 31000000: 1, 23000000: 1, 22000000: 1, 21000000: 1, 8000000: 1, 180000000: 1, 32000000: 1, 30000000: 1, 10000000: 1, 85000000: 1, 110000000: 1, 60000000: 1, 49000000: 1, 47000000: 1, 40000000: 1, 26000000: 1, 19000000: 1, 14000000: 1, 11000000: 1})
    

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Autumn amd that has a Budget of $50,000 to $20 Million..

    In [297]:
    group_one_index = []
    for i in cluster_d_index:
        if 0 <= df_4D['Budget'][i] <=20000000:group_one_index.append(i) 
    print(group_one_index)#showing the group_one_index list
    
    [15, 18, 20, 22, 25, 26, 28, 30, 33, 34, 38, 41, 42, 45, 49, 54, 55, 58, 62, 66, 71, 77, 86, 87, 89, 91, 103, 105, 108, 123, 124, 125, 159, 160, 168, 169, 173, 177, 181, 187, 188, 189, 193, 196, 199, 201, 204, 206, 208, 209, 210, 214, 215, 221, 222, 223]
    

    Checking the number of elements in the 'group_one_index' list.

    In [298]:
    len(group_one_index)
    
    Out[298]:
    56

    Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Autumn amd that has a Budget of $21 Million to $200 Million..

    In [299]:
    group_two_index = []
    for i in cluster_d_index:
        if 20000001 <= df_4D['Budget'][i] :group_two_index.append(i)  
    print(group_two_index)#showing the group_two_index list
    
    [1, 5, 8, 9, 10, 11, 13, 56, 57, 72, 76, 113, 127, 129, 130, 131, 133, 134, 135, 137, 140, 150]
    

    Checking the number of elements in the 'group_two_index' list.

    In [300]:
    len(group_two_index)
    
    Out[300]:
    22

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Autumn with a Budget of $50,000 to $20 Million.

    In [302]:
    open_d_one = []
    for i in group_one_index:open_d_one.append(df_4D['Opening_Weekend'][i])
    print(open_d_one)#showing the open_d_one list
    
    [5100000, 118298, 2002165, 253510, 257174, 256498, 7485546, 52041, 561906, 135388, 156833, 18623, 100268, 137651, 170335, 2337594, 287081, 27547866, 1203011, 27547866, 5268764, 6836036, 1528982, 2739680, 298277, 89054, 2914486, 162146, 0, 0, 0, 0, 4765838, 105005, 76244, 228359, 15679190, 14065500, 4750894, 9112839, 20321, 128140, 0, 85709, 63918, 100316, 100316, 649423, 11014818, 63918, 25775847, 31665, 245398, 63918, 130303, 0]
    

    Checking the number of elements in the 'open_d_one' list.

    In [303]:
    len(open_d_one)
    
    Out[303]:
    56

    Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Autumn with a Budget of $21 Million to $200 Million.

    In [304]:
    open_d_two = []
    for i in group_two_index:open_d_two.append(df_4D['Opening_Weekend'][i])
    print(open_d_two)#showing the open_d_two list
    
    [37513109, 13143310, 736311, 24900566, 10470145, 492648, 19497324, 11364505, 19152401, 9178233, 9421369, 11457353, 55785112, 22403596, 11947744, 35574710, 220522, 320690, 24074047, 15371203, 29632823, 10003827]
    

    Checking the number of elements in the 'open_d_two' list.

    In [305]:
    len(open_d_two)
    
    Out[305]:
    22

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn with a Budget of $50,000 to $20 Million . Which will be stored in a dictionary called 'open_d_one'.

    In [310]:
    open_d_one = []
    for i in group_one_index:open_d_one.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
    
    Counter(open_d_one)
    
    Out[310]:
    Counter({5000000: 4,
             0: 35,
             2000000: 3,
             7000000: 2,
             1000000: 3,
             28000000: 2,
             3000000: 2,
             16000000: 1,
             14000000: 1,
             9000000: 1,
             11000000: 1,
             26000000: 1})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $28 Million with a Budget of $50,000 to $20 Million.

    In [306]:
    max(open_d_one)
    
    Out[306]:
    27547866

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $0 with a Budget of $50,000 to $20 Million.

    In [307]:
    min(open_d_one)
    
    Out[307]:
    0
    In [375]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_one:
        if 0 <= i <=5000000:print(i)
    
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    
    In [376]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_one:
        if 5000001 <= i <=10000000:print(i)
    
    10000000
    10000000
    10000000
    10000000
    10000000
    10000000
    10000000
    
    In [378]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_one:
        if 15000000 <= i <=20000000:print(i)
    
    20000000
    
    In [379]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_one:
        if 20000000 <= i <=25000000:print(i)
    
    20000000
    
    In [380]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_one:
        if 25000001 <= i :print(i)
    
    30000000
    30000000
    30000000
    

    Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn with a Budget of $21 Million to $200 Million . Which will be stored in a dictionary called 'open_d_two'.

    In [381]:
    open_d_two = []
    for i in group_two_index:open_d_two.append(round_to_multiple(df['Opening_Weekend'][i],10000000))
    
    collections.Counter(open_d_two)
    
    Out[381]:
    Counter({40000000: 2,
             10000000: 8,
             0: 4,
             20000000: 6,
             60000000: 1,
             30000000: 1})

    The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $60 Million with a Budget of $21 Million to $200 Million.

    In [373]:
    max(open_d_two)
    
    Out[373]:
    55785112

    The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $220,000 with a Budget of $21 Million to $200 Million.

    In [374]:
    min(open_d_two)
    
    Out[374]:
    220522
    In [382]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_two:
        if 0 <= i <=10000000:print(i)
    
    10000000
    0
    10000000
    0
    10000000
    10000000
    10000000
    10000000
    10000000
    0
    0
    10000000
    
    In [383]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_two:
        if 10000001 <= i <=20000000:print(i)
    
    20000000
    20000000
    20000000
    20000000
    20000000
    20000000
    
    In [384]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_two:
        if 20000001 <= i <=30000000:print(i)
    
    30000000
    
    In [385]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_two:
        if 30000001 <= i <=40000000:print(i)
    
    40000000
    40000000
    
    In [386]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in open_d_two:
        if 40000001 <= i :print(i)
    
    60000000
    

    Creating the df_profit_season dataframe.

    In [327]:
    df_profit_season = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
                       'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
                       "Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4,
                       "Name":name+name1+name2+name3+name4
                       })
    

    The 'df_profit_season' dataframe. (this dataframe is interactive)

    In [328]:
    df_profit_season
    
    Out[328]:
    Budget Season Profit Name
    Loading... (need help?)

    Creating a 3D scatter plot of the Budget, Season and Profit of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object

    In [451]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    
    
    fig = plt.figure()
    ax = Axes3D(fig)
    
    cluster = ax.scatter(df['Budget'],df['Season'],df['Profit'], alpha=0.5,s=50, color='#bd1783')
    scatter = ax.set_zlim3d(0,800000000)
    cluster = ax.set_xlabel('Budget')
    cluster = ax.set_ylabel('Season')
    cluster = ax.set_zlabel('Profit')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/2738090057.py:9: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[451]:
    <matplotlib.animation.FuncAnimation at 0x2bfc4e7a370>

    Saving the animated 3D scatter plot gif as 'drama8.gif'.

    In [452]:
    writergif = animation.PillowWriter(fps=30)
    ani.save('drama8.gif', fps=10 )
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The fourth 3D Scatter Plot (part A): the x-axis is the 'Budget', the y-axis is the 'Seson' and the z-axis is the 'Profit'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons, the the profit of each clusters will be summed and then will be averaged and will also be analyzed to see which season is the most profitable and most consistant using the Standard Deviation of each Season's Profit.

    Creating a 3D scatter plot of the Budget, Season and Profit of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object

    In [454]:
    def animate(i):
        # azimuth angle : 0 deg to 360 deg
        ax.view_init(elev=10, azim=i*4)
        return fig
    
    
    fig = plt.figure()
    #fig = plt.figure(figsize=(4, 15))
    #fig = plt.figure()
    
    ax = Axes3D(fig)
    
    df1 = df[df.Season==1]
    df2 = df[df.Season==2]
    df3 = df[df.Season==3]
    df4 = df[df.Season==4]
    
    x1 = df1['Budget']
    y1 = df1['Season']
    z1 = df1['Profit']
    
    
    x2 = df2['Budget']
    y2 = df2['Season']
    z2 = df2['Profit']
    
    
    x3 = df3['Budget']
    y3 = df3['Season']
    z3 = df3['Profit']
    
    
    x4 = df4['Budget']
    y4 = df4['Season']
    z4 = df4['Profit']
    
    
    #ax1 = fig.add_subplot(131, projection='3d')
    scatter = ax.scatter(x1,y1,z1, alpha=0.5,s=50, color = '#ed93cd')
    scatter = ax.scatter(x2,y2,z2, alpha=0.5,s=50, color = '#a64885')
    scatter = ax.scatter(x3,y3,z3, alpha=0.5,s=50, color = '#96276f')
    scatter = ax.scatter(x4,y4,z4, alpha=0.5,s=50, color = '#780d53')
    scatter = ax.set_zlim3d(0,800000000)
    
    scatter = ax.set_xlabel('Budget')
    scatter = ax.set_ylabel('Season')
    scatter = ax.set_zlabel('Profit')
    
    ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
    ani     
    
    C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1397851132.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6.  This is consistent with other Axes classes.
      ax = Axes3D(fig)
    
    Out[454]:
    <matplotlib.animation.FuncAnimation at 0x2bfbb947fa0>

    Saving the animated 3D scatter plot gif as 'drama9.gif'.

    In [455]:
    writergif = animation.PillowWriter(fps=30)
    ani.save('drama9.gif', fps=10 )
    
    MovieWriter ffmpeg unavailable; using Pillow instead.
    

    The fourth 3D Scatter Plot (part B): the x-axis is the 'Budget', the y-axis is the 'Seson' and the z-axis is the 'Profit'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons, the the profit of each clusters will be summed and then will be averaged and will also be analyzed to see which season is the most profitable and most consistant using the Standard Deviation of each Season's Profit.

    Getting the Names and Profit of movies that were realesed in the Winter.

    In [329]:
    profit_1 = []
    name_1 = []
    for i,x in enumerate(df_profit_season.Season):
        if x == 1:profit_1.append(df_profit_season.Profit[i])
        
    for i,x in enumerate(df_profit_season.Season):
        if x == 1:name_1.append(df_profit_season.Name[i])
    

    Showing the 'profit_1' list.

    In [333]:
    print(profit_1)
    
    [349948323, 326398492, 316350619, 82112435, 530998101, 318266710, 7859167, 45178935, 3765283, 12636004, 36954520, 15566240, 1851683, 556082, 10531500, 176601214, 26696000, 35694916, 120587063, 83269971, 118582776, 3101815, 107956187, 21856053, 104285432, 28716963, 71808942, 3851000, 30482317, 55071636, 559454789, 129748880, 129590606, 60143987, 49309093, 217276928, 167618160, 66050951, 117033509, 57917283, 40282881, 36545707, 113955898, 5601987, 20909437, 27087044, 12971021, 23787727, 36699612, 1205034, 13912841, 121165, 13912841, 307113, 13912841, 15566240, 13912841, 34897711]
    

    Checking the number of elements in the 'profit_1' list.

    In [408]:
    len(profit_1)
    
    Out[408]:
    58

    Showing the 'name_1' list.

    In [334]:
    print(name_1)
    
    ['Django Unchained', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Zero Dark Thirty', 'Fifty Shades of Grey', 'Black Swan', 'If Beale Street Could Talk', 'Quartet', 'We Need to Talk About Kevin', 'Mommy', 'The Witch', 'Blue Valentine', 'Ghost Story', 'Zoot Suit', 'The Lunchbox', 'Little Women', 'The Jazz Singer', 'A Walk to Remember', 'Bridge to Terabithia', "Mr. Holland's Opus", 'Sense and Sensibility', 'The Secret of Roan Inish', 'Forever Young', 'Taps', 'On Golden Pond', 'Absence of Malice', 'Footloose', 'Lassie Come Home', 'The Tale of Despereaux', 'My Fair Lady 1964', 'Sing', 'The Post', 'The Impossible', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Vow', 'Safe Haven', 'Dear John', 'Rings', 'Fences', 'The Roommate', 'The Woman in Black', 'Country Strong', 'Project Almanac', 'Amour', 'Black or White', 'The Bye Bye Man', 'Still Alice', 'Rabbit Hole', 'Shame', 'The Dreamers', 'Shame', 'The Dreamers', 'Shame', 'Blue Valentine', 'Shame', 'Last Tango in Paris']
    

    Checking the number of elements in the 'name_1' list.

    In [409]:
    len(name_1)
    
    Out[409]:
    58

    Getting the Names and Profit of movies that were realesed in the Spring.

    In [330]:
    profit_2 = []
    name_2 = []
    for i,x in enumerate(df_profit_season.Season):
        if x == 2:profit_2.append(df_profit_season.Profit[i])
        
    for i,x in enumerate(df_profit_season.Season):
        if x == 2:name_2.append(df_profit_season.Name[i])
    

    Showing the 'profit_2' list.

    In [336]:
    print(profit_2)
    
    [24154026, 8554727, 25358392, 34913, 20251930, 14610760, 88390, 12744931, 156309, 294448, 68711836, 72678948, 447351353, 10948425, 69137047, 62667874, 3835130, 108052686, 3943124, 20000000, 1711143, 58693537, 1250000, 58491516, 293281000, 278014195, 37707417, 10300000, 78809717, 26721826, 29802928, 38984536, 71633833, 4847480, 317522294, 21028230, 40506120, 51603136, 21556959, 29964656, 13945682, 12698355, 4856268, 257845, 659312, 89410061, 256669, 94673038, 401802, 858737]
    

    Checking the number of elements in the 'profit_2' list.

    In [411]:
    len(profit_2)
    
    Out[411]:
    50

    Showing the 'name_2' list.

    In [337]:
    print(name_2)
    
    ['Priest', 'The Water Diviner', 'Ex Machina', 'Stoker', 'Before Midnight', 'Silent House', 'Locke', 'Unsane', 'Palo Alto', 'Sound of My Voice', 'Fame', 'The Last Song', 'Cinderella', 'Akeelah and the Bee', 'The Last Song', "God's Not Dead", 'The Spanish Prisoner', 'Rocky III', 'Tender Mercies', 'The Natural', 'A Sunday in the Country', 'The Rookie', 'Pollyanna', 'The Rookie', 'The Secret Garden', 'The Sound of Music', "Hachiko: A Dog's Story", 'Three Cions in the Fountain', 'Water for Elephants', 'The Tree of Life', 'The Longest Ride', 'The Age of Adaline', 'The Lucky One', 'Draft Day', 'A Quiet Place', 'Beastly', 'Remember Me', 'Everything, Everything', 'Mud', 'Gifted', 'Before I Fall', 'Ida', 'Matador', 'Tokyo Decadence', 'Wide Sargasso Sea', 'Crash', 'Elles', 'Crash', 'Pink Flamingos', 'Law of Desire']
    

    Checking the number of elements in the 'name_2' list.

    In [338]:
    len(name_2)
    
    Out[338]:
    50

    Getting the Names and Profit of movies that were realesed in the Summer.

    In [331]:
    profit_3 = []
    name_3 = []
    for i,x in enumerate(df_profit_season.Season):
        if x == 3:profit_3.append(df_profit_season.Profit[i])
        
    for i,x in enumerate(df_profit_season.Season):
        if x == 3:name_3.append(df_profit_season.Name[i])
    

    Showing the 'profit_3' list.

    In [339]:
    print(profit_3)
    
    [26604054, 60133905, 53273049, 14131551, 8153415, 2669782, 14718173, 70975239, 36918287, 70986904, 33102988, 74830111, 120036382, 81120329, 12815212, 7423752, 544368315, 42892670, 43947950, 12469621, 255500000, 216100000, 7657973, 941214868, 267142000, 4478084, 132552290, 188120004, 41540205, 188265198, 44168692, 11477345, 67356170, 143806510, 1927779, 2548651, 16283563, 8000000, 18912216]
    

    Checking the number of elements in the 'profit_3' list.

    In [413]:
    len(profit_3)
    
    Out[413]:
    39

    Showing the 'name_3' list.

    In [340]:
    print(name_3)
    
    ['The Debt', 'Hereditary', 'Boyhood', "Winter's Bone", 'We Are Your Friends', 'A Ghost Story', 'Endless Love', 'War Room', 'Urban Cowboy', 'War Room', 'Overcomer', 'The Lake House', 'Phenomenon', 'Contact', 'Honeysuckle Rose', 'The Night the Lights Went Out in Georgia', 'Tex', 'Staying Alive', 'The Little Rascals', 'Ramona and Beezus', 'The Hunchback of Notre Drame', 'Babe', 'Kit Kittredge: An American Girl', 'The Lion King 1994', 'Bambi 1942', 'Charlie St. Cloud', 'Step Up Revolution', 'The Help', 'The Giver', 'Me Before You', 'One Day', 'Wish Upon', 'If I Stay', 'Lights Out', 'Another Earth', 'Arabian Nights', 'Natural Born Killers', 'Beyond the Valley of the Dolls', 'Kids']
    

    Checking the number of elements in the 'name_3' list.

    In [341]:
    len(name_3)
    
    Out[341]:
    39

    Getting the Names and Profit of movies that were realesed in the Autumn.

    In [332]:
    profit_4 = []
    name_4 = []
    for i,x in enumerate(df_profit_season.Season):
        if x == 4:profit_4.append(df_profit_season.Profit[i])
        
    for i,x in enumerate(df_profit_season.Season):
        if x == 4:name_4.append(df_profit_season.Name[i])
    

    Showing the 'profit_4' list.

    In [342]:
    print(profit_4)
    
    [307567189, 19966854, 13147416, 129558438, 54735925, 9898681, 17017873, 8270399, 23262783, 23830713, 31043521, 12417298, 69233867, 12499242, 222016, 17033227, 35669037, 9295324, 4328516, 19282640, 4438911, 48766923, 1500000, 2000000, 47784, 59068724, 284604712, 4609597, 285937718, 4344615, 6741732, 34605762, 32973297, 48954968, 5164458, 31440294, 150297525, 11587135, 418656843, 35099643, 58985708, 23794409, 52500000, 5850377, 583698673, 77551594, 35552675, 163591522, 58660270, 22004627, 156127894, 122498338, 136567581, 15059418, 2281732, 57086711, 20044909, 20069303, 51076141, 72831866, 10369708, 33185884, 4152584, 3478400, 8404, 18912216, 52091915, 15465835, 15390895, 1315026, 201120004, 50167430, 2311944, 3664240, 1038916, 50167430, 3546453, 958404]
    

    Checking the number of elements in the 'profit_4' list.

    In [415]:
    len(profit_4)
    
    Out[415]:
    78

    Showing the 'name_4' list.

    In [343]:
    print(name_4)
    
    ['Gone Girl', 'Crimson Peak', 'The Master', 'Flight', 'The Ides of March', 'Nocturnal Animals', 'For Colored Girls', 'Let Me In', 'Room', 'Arbitrage', 'Carol', 'Melancholia', 'Manchester by the Sea', 'Addicted', 'Take Shelter', 'Margin Call', 'Whiplash', 'The Florida Project', 'Knock Knock', 'Buried', 'Martha Marcy May Marlene', 'Ordinary People', 'Rich and Famous', 'Raggedy Man', 'Hugo', 'Dolphin Tale', 'Wonder', 'Somewhere in Time', 'Wonder', 'Tuck Everlasting', 'Dreamer', 'August Rush', 'Fireproof', 'The Remains of the Day', 'Pure Country', 'A River Runs Through It', 'Resurrection', 'Prancer', 'Beauty and the Beast 1991', 'The Black Stallion', "Charlotte's Web", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Gravity', 'Contagion', 'Burlesque', 'Creed II', 'Hereafter', 'Anna Karenina', 'Arrival', 'Bridge of Spies', 'Creed', 'The Best of Me', 'The Light Between Oceans', 'The Book Thief', 'Suffragette', 'The Perks of Being a Wallflower', 'Brooklyn', 'Ouija: Origin of Evil', 'The Words', 'Courageous', 'Mustang', 'Like Crazy', 'Whore', 'Kids', 'Lust, Caution', 'Blue Is the Warmest Colour', 'Blue Is the Warmest Colour', 'Two Girls and a Guy', 'Hell', 'Se, jie', 'The Evil Dead', 'Clerks', 'Bad Lieutenant', 'Lust, Caution ', 'Happiness 1998', 'Whore 1991']
    

    Checking the number of elements in the 'name_4' list.

    In [344]:
    len(name_4)
    
    Out[344]:
    78

    Getting the number of movies that made Profit in the Winter.

    In [345]:
    sum_1 = []
    for i in profit_1:
        if i < 0: continue
        else: sum_1.append(i)
    len(sum_1)
    
    Out[345]:
    58

    Repeating the total amount of Profit genearted by Drama movies in the Winter by the number of movies that made profit.

    In [347]:
    var1 = []
    for i in profit_1: var1.append(sum(sum_1))
    print(var1)#showing the var1 list
    
    [5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506]
    

    Getting the number of movies that made Profit in the Spring.

    In [349]:
    sum_2 = []
    for i in profit_2:
        if i < 0: continue
        else: sum_2.append(i)
    len(sum_2)
    
    Out[349]:
    50

    Repeating the total amount of Profit genearted by Drama movies in the Spring by the number of movies that made profit.

    In [350]:
    var2 = []
    for i in profit_2: var2.append(sum(sum_2))
    print(var2)#showing the var2 list
    
    [2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541]
    

    Getting the number of movies that made Profit in the Summer.

    In [351]:
    sum_3 = []
    for i in profit_3:
        if i < 0: continue
        else: sum_3.append(i)
    len(sum_3)
    
    Out[351]:
    39

    Repeating the total amount of Profit genearted by Drama movies in the Summer by the number of movies that made profit.

    In [353]:
    var3 = []
    for i in profit_3: var3.append(sum(sum_3))
    print(var3)#showing the var3 list
    
    [3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237]
    

    Getting the number of movies that made Profit in the Autumn.

    In [354]:
    sum_4 = []
    for i in profit_4:
        if i < 0: continue
        else: sum_4.append(i)
    len(sum_4)
    
    Out[354]:
    78

    Repeating the total amount of Profit genearted by Drama movies in the Autumn by the number of movies that made profit.

    In [355]:
    var4 = []
    for i in profit_4: var4.append(sum(sum_4))
    print(var4)#showing the var4 list
    
    [4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036]
    

    Putting the Names of Drama movies that were realesed in Winter with the carrosponding Profit for the visualizations below.

    In [361]:
    for x in range(len(name_1)):
        print("[ '",name_1[x],"'",',', profit_1[x],'],')
    
    [ ' Django Unchained ' , 349948323 ],
    [ ' Fifty Shades Darker ' , 326398492 ],
    [ ' Fifty Shades Freed ' , 316350619 ],
    [ ' Zero Dark Thirty ' , 82112435 ],
    [ ' Fifty Shades of Grey ' , 530998101 ],
    [ ' Black Swan ' , 318266710 ],
    [ ' If Beale Street Could Talk ' , 7859167 ],
    [ ' Quartet ' , 45178935 ],
    [ ' We Need to Talk About Kevin ' , 3765283 ],
    [ ' Mommy ' , 12636004 ],
    [ ' The Witch ' , 36954520 ],
    [ ' Blue Valentine ' , 15566240 ],
    [ ' Ghost Story ' , 1851683 ],
    [ ' Zoot Suit ' , 556082 ],
    [ ' The Lunchbox ' , 10531500 ],
    [ ' Little Women ' , 176601214 ],
    [ ' The Jazz Singer ' , 26696000 ],
    [ ' A Walk to Remember ' , 35694916 ],
    [ ' Bridge to Terabithia ' , 120587063 ],
    [ ' Mr. Holland's Opus ' , 83269971 ],
    [ ' Sense and Sensibility ' , 118582776 ],
    [ ' The Secret of Roan Inish ' , 3101815 ],
    [ ' Forever Young ' , 107956187 ],
    [ ' Taps ' , 21856053 ],
    [ ' On Golden Pond ' , 104285432 ],
    [ ' Absence of Malice ' , 28716963 ],
    [ ' Footloose ' , 71808942 ],
    [ ' Lassie Come Home ' , 3851000 ],
    [ ' The Tale of Despereaux ' , 30482317 ],
    [ ' My Fair Lady 1964 ' , 55071636 ],
    [ ' Sing ' , 559454789 ],
    [ ' The Post ' , 129748880 ],
    [ ' The Impossible ' , 129590606 ],
    [ ' The Rite ' , 60143987 ],
    [ ' Collateral Beauty ' , 49309093 ],
    [ ' True Grit ' , 217276928 ],
    [ ' The Vow ' , 167618160 ],
    [ ' Safe Haven ' , 66050951 ],
    [ ' Dear John ' , 117033509 ],
    [ ' Rings ' , 57917283 ],
    [ ' Fences ' , 40282881 ],
    [ ' The Roommate ' , 36545707 ],
    [ ' The Woman in Black ' , 113955898 ],
    [ ' Country Strong ' , 5601987 ],
    [ ' Project Almanac ' , 20909437 ],
    [ ' Amour ' , 27087044 ],
    [ ' Black or White ' , 12971021 ],
    [ ' The Bye Bye Man ' , 23787727 ],
    [ ' Still Alice ' , 36699612 ],
    [ ' Rabbit Hole ' , 1205034 ],
    [ ' Shame ' , 13912841 ],
    [ ' The Dreamers ' , 121165 ],
    [ ' Shame ' , 13912841 ],
    [ ' The Dreamers ' , 307113 ],
    [ ' Shame ' , 13912841 ],
    [ ' Blue Valentine ' , 15566240 ],
    [ ' Shame ' , 13912841 ],
    [ ' Last Tango in Paris ' , 34897711 ],
    

    Putting the Names of Drama movies that were realesed in Spring with the carrosponding Profit for the visualizations below.

    In [425]:
    for x in range(len(name_2)):
        print("[ '",name_2[x],"'",',', profit_2[x],'],')
    
    [ ' Priest ' , 24154026 ],
    [ ' The Water Diviner ' , 8554727 ],
    [ ' Ex Machina ' , 25358392 ],
    [ ' Stoker ' , 34913 ],
    [ ' Before Midnight ' , 20251930 ],
    [ ' Silent House ' , 14610760 ],
    [ ' Locke ' , 88390 ],
    [ ' Unsane ' , 12744931 ],
    [ ' Palo Alto ' , 156309 ],
    [ ' Sound of My Voice ' , 294448 ],
    [ ' Fame ' , 68711836 ],
    [ ' The Last Song ' , 72678948 ],
    [ ' Cinderella ' , 447351353 ],
    [ ' Akeelah and the Bee ' , 10948425 ],
    [ ' The Last Song ' , 69137047 ],
    [ ' God's Not Dead ' , 62667874 ],
    [ ' The Spanish Prisoner ' , 3835130 ],
    [ ' Rocky III ' , 108052686 ],
    [ ' Tender Mercies ' , 3943124 ],
    [ ' The Natural ' , 20000000 ],
    [ ' A Sunday in the Country ' , 1711143 ],
    [ ' The Rookie ' , 58693537 ],
    [ ' Pollyanna ' , 1250000 ],
    [ ' The Rookie ' , 58491516 ],
    [ ' The Secret Garden ' , 293281000 ],
    [ ' The Sound of Music ' , 278014195 ],
    [ ' Hachiko: A Dog's Story ' , 37707417 ],
    [ ' Three Cions in the Fountain ' , 10300000 ],
    [ ' Water for Elephants ' , 78809717 ],
    [ ' The Tree of Life ' , 26721826 ],
    [ ' The Longest Ride ' , 29802928 ],
    [ ' The Age of Adaline ' , 38984536 ],
    [ ' The Lucky One ' , 71633833 ],
    [ ' Draft Day ' , 4847480 ],
    [ ' A Quiet Place ' , 317522294 ],
    [ ' Beastly ' , 21028230 ],
    [ ' Remember Me ' , 40506120 ],
    [ ' Everything, Everything ' , 51603136 ],
    [ ' Mud ' , 21556959 ],
    [ ' Gifted ' , 29964656 ],
    [ ' Before I Fall ' , 13945682 ],
    [ ' Ida ' , 12698355 ],
    [ ' Matador ' , 4856268 ],
    [ ' Tokyo Decadence ' , 257845 ],
    [ ' Wide Sargasso Sea ' , 659312 ],
    [ ' Crash ' , 89410061 ],
    [ ' Elles ' , 256669 ],
    [ ' Crash ' , 94673038 ],
    [ ' Pink Flamingos ' , 401802 ],
    [ ' Law of Desire ' , 858737 ],
    

    Putting the Names of Drama movies that were realesed in Summer with the carrosponding Profit for the visualizations below.

    In [426]:
    for x in range(len(name_3)):
        print("[ '",name_3[x],"'",',', profit_3[x],'],')
    
    [ ' The Debt ' , 26604054 ],
    [ ' Hereditary ' , 60133905 ],
    [ ' Boyhood ' , 53273049 ],
    [ ' Winter's Bone ' , 14131551 ],
    [ ' We Are Your Friends ' , 8153415 ],
    [ ' A Ghost Story ' , 2669782 ],
    [ ' Endless Love ' , 14718173 ],
    [ ' War Room ' , 70975239 ],
    [ ' Urban Cowboy ' , 36918287 ],
    [ ' War Room ' , 70986904 ],
    [ ' Overcomer ' , 33102988 ],
    [ ' The Lake House ' , 74830111 ],
    [ ' Phenomenon ' , 120036382 ],
    [ ' Contact ' , 81120329 ],
    [ ' Honeysuckle Rose ' , 12815212 ],
    [ ' The Night the Lights Went Out in Georgia ' , 7423752 ],
    [ ' Tex ' , 544368315 ],
    [ ' Staying Alive ' , 42892670 ],
    [ ' The Little Rascals ' , 43947950 ],
    [ ' Ramona and Beezus ' , 12469621 ],
    [ ' The Hunchback of Notre Drame ' , 255500000 ],
    [ ' Babe ' , 216100000 ],
    [ ' Kit Kittredge: An American Girl ' , 7657973 ],
    [ ' The Lion King 1994 ' , 941214868 ],
    [ ' Bambi 1942 ' , 267142000 ],
    [ ' Charlie St. Cloud ' , 4478084 ],
    [ ' Step Up Revolution ' , 132552290 ],
    [ ' The Help ' , 188120004 ],
    [ ' The Giver ' , 41540205 ],
    [ ' Me Before You ' , 188265198 ],
    [ ' One Day ' , 44168692 ],
    [ ' Wish Upon ' , 11477345 ],
    [ ' If I Stay ' , 67356170 ],
    [ ' Lights Out ' , 143806510 ],
    [ ' Another Earth ' , 1927779 ],
    [ ' Arabian Nights ' , 2548651 ],
    [ ' Natural Born Killers ' , 16283563 ],
    [ ' Beyond the Valley of the Dolls ' , 8000000 ],
    [ ' Kids ' , 18912216 ],
    

    Putting the Names of Drama movies that were realesed in Autumn with the carrosponding Profit for the visualizations below.

    In [427]:
    for x in range(len(name_4)):
        print("[ '",name_4[x],"'",',', profit_4[x],'],')
    
    [ ' Gone Girl ' , 307567189 ],
    [ ' Crimson Peak ' , 19966854 ],
    [ ' The Master ' , 13147416 ],
    [ ' Flight ' , 129558438 ],
    [ ' The Ides of March ' , 54735925 ],
    [ ' Nocturnal Animals ' , 9898681 ],
    [ ' For Colored Girls ' , 17017873 ],
    [ ' Let Me In ' , 8270399 ],
    [ ' Room ' , 23262783 ],
    [ ' Arbitrage ' , 23830713 ],
    [ ' Carol ' , 31043521 ],
    [ ' Melancholia ' , 12417298 ],
    [ ' Manchester by the Sea ' , 69233867 ],
    [ ' Addicted ' , 12499242 ],
    [ ' Take Shelter ' , 222016 ],
    [ ' Margin Call ' , 17033227 ],
    [ ' Whiplash ' , 35669037 ],
    [ ' The Florida Project ' , 9295324 ],
    [ ' Knock Knock ' , 4328516 ],
    [ ' Buried ' , 19282640 ],
    [ ' Martha Marcy May Marlene ' , 4438911 ],
    [ ' Ordinary People ' , 48766923 ],
    [ ' Rich and Famous ' , 1500000 ],
    [ ' Raggedy Man ' , 2000000 ],
    [ ' Hugo ' , 47784 ],
    [ ' Dolphin Tale ' , 59068724 ],
    [ ' Wonder ' , 284604712 ],
    [ ' Somewhere in Time ' , 4609597 ],
    [ ' Wonder ' , 285937718 ],
    [ ' Tuck Everlasting ' , 4344615 ],
    [ ' Dreamer ' , 6741732 ],
    [ ' August Rush ' , 34605762 ],
    [ ' Fireproof ' , 32973297 ],
    [ ' The Remains of the Day ' , 48954968 ],
    [ ' Pure Country ' , 5164458 ],
    [ ' A River Runs Through It ' , 31440294 ],
    [ ' Resurrection ' , 150297525 ],
    [ ' Prancer ' , 11587135 ],
    [ ' Beauty and the Beast 1991 ' , 418656843 ],
    [ ' The Black Stallion ' , 35099643 ],
    [ ' Charlotte's Web ' , 58985708 ],
    [ ' Giant ' , 23794409 ],
    [ ' The Ten Commandments 1966 ' , 52500000 ],
    [ ' The Quiet Man ' , 5850377 ],
    [ ' Gravity ' , 583698673 ],
    [ ' Contagion ' , 77551594 ],
    [ ' Burlesque ' , 35552675 ],
    [ ' Creed II ' , 163591522 ],
    [ ' Hereafter ' , 58660270 ],
    [ ' Anna Karenina ' , 22004627 ],
    [ ' Arrival ' , 156127894 ],
    [ ' Bridge of Spies ' , 122498338 ],
    [ ' Creed ' , 136567581 ],
    [ ' The Best of Me ' , 15059418 ],
    [ ' The Light Between Oceans ' , 2281732 ],
    [ ' The Book Thief ' , 57086711 ],
    [ ' Suffragette ' , 20044909 ],
    [ ' The Perks of Being a Wallflower ' , 20069303 ],
    [ ' Brooklyn ' , 51076141 ],
    [ ' Ouija: Origin of Evil ' , 72831866 ],
    [ ' The Words ' , 10369708 ],
    [ ' Courageous ' , 33185884 ],
    [ ' Mustang ' , 4152584 ],
    [ ' Like Crazy ' , 3478400 ],
    [ ' Whore ' , 8404 ],
    [ ' Kids ' , 18912216 ],
    [ ' Lust, Caution ' , 52091915 ],
    [ ' Blue Is the Warmest Colour ' , 15465835 ],
    [ ' Blue Is the Warmest Colour ' , 15390895 ],
    [ ' Two Girls and a Guy ' , 1315026 ],
    [ ' Hell ' , 201120004 ],
    [ ' Se, jie ' , 50167430 ],
    [ ' The Evil Dead ' , 2311944 ],
    [ ' Clerks ' , 3664240 ],
    [ ' Bad Lieutenant ' , 1038916 ],
    [ ' Lust, Caution  ' , 50167430 ],
    [ ' Happiness 1998 ' , 3546453 ],
    [ ' Whore 1991 ' , 958404 ],
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Winter. Which will be stored in a dictionary called 'winter_group'.

    In [365]:
    winter_group = []
    for i in profit_1:winter_group.append(round_to_multiple(i,10000000))
    
    Counter(winter_group)
    
    Out[365]:
    Counter({350000000: 1,
             330000000: 1,
             320000000: 2,
             80000000: 2,
             530000000: 1,
             10000000: 9,
             50000000: 2,
             0: 8,
             40000000: 5,
             20000000: 5,
             180000000: 1,
             30000000: 5,
             120000000: 3,
             110000000: 2,
             100000000: 1,
             70000000: 2,
             60000000: 3,
             560000000: 1,
             130000000: 2,
             220000000: 1,
             170000000: 1})

    The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Winter is $600 Million.

    In [362]:
    max(profit_1)
    
    Out[362]:
    559454789

    The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Winter is $120,000.

    In [363]:
    min(profit_1)
    
    Out[363]:
    121165
    In [465]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_1:
        if 0 <= i <=50000000:print(i)
    
    7859167
    45178935
    3765283
    12636004
    36954520
    15566240
    1851683
    556082
    10531500
    26696000
    35694916
    3101815
    21856053
    28716963
    3851000
    30482317
    49309093
    40282881
    36545707
    5601987
    20909437
    27087044
    12971021
    23787727
    36699612
    1205034
    13912841
    121165
    13912841
    307113
    13912841
    15566240
    13912841
    34897711
    
    In [466]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_1:
        if 50000000 <= i <=100000000:print(i)
    
    82112435
    83269971
    71808942
    55071636
    60143987
    66050951
    57917283
    
    In [467]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_1:
        if 100000000<= i <=200000000:print(i)
    
    176601214
    120587063
    118582776
    107956187
    104285432
    129748880
    129590606
    167618160
    117033509
    113955898
    
    In [468]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_1:
        if 200000000<= i <=300000000:print(i)
    
    217276928
    
    In [469]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_1:
        if 300000000<= i <=400000000:print(i)
    
    349948323
    326398492
    316350619
    318266710
    
    In [470]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_1:
        if 400000000<= i <=600000000:print(i)
    
    530998101
    559454789
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Spring. Which will be stored in a dictionary called 'spring_group'.

    In [367]:
    spring_group = []
    for i in profit_2:spring_group.append(round_to_multiple(i,10000000))
    
    Counter(spring_group)
    
    Out[367]:
    Counter({20000000: 5,
             10000000: 7,
             30000000: 4,
             0: 15,
             70000000: 4,
             450000000: 1,
             60000000: 3,
             110000000: 1,
             290000000: 1,
             280000000: 1,
             40000000: 3,
             80000000: 1,
             320000000: 1,
             50000000: 1,
             90000000: 2})

    The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Spring is $450 Million.

    In [501]:
    max(profit_2)
    
    Out[501]:
    447351353

    The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Spring is $35,000.

    In [502]:
    min(profit_2)
    
    Out[502]:
    34913
    In [473]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_2:
        if 0 <= i <=50000000:print(i)
    
    24154026
    8554727
    25358392
    34913
    20251930
    14610760
    88390
    12744931
    156309
    294448
    10948425
    3835130
    3943124
    20000000
    1711143
    1250000
    37707417
    10300000
    26721826
    29802928
    38984536
    4847480
    21028230
    40506120
    21556959
    29964656
    13945682
    12698355
    4856268
    257845
    659312
    256669
    401802
    858737
    
    In [474]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_2:
        if 50000000 <= i <=100000000:print(i)
    
    68711836
    72678948
    69137047
    62667874
    58693537
    58491516
    78809717
    71633833
    51603136
    89410061
    94673038
    
    In [475]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_2:
        if 100000000<= i <=200000000:print(i)
    
    108052686
    
    In [476]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_2:
        if 200000000<= i <=300000000:print(i)
    
    293281000
    278014195
    
    In [477]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_2:
        if 300000000<= i <=400000000:print(i)
    
    317522294
    
    In [503]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_2:
        if 400000000<= i :print(i)
    
    447351353
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Summer. Which will be stored in a dictionary called 'summer_group'.

    In [369]:
    summer_group = []
    for i in profit_3:summer_group.append(round_to_multiple(i,10000000))
    
    Counter(summer_group)
    
    Out[369]:
    Counter({30000000: 2,
             60000000: 1,
             50000000: 1,
             10000000: 9,
             0: 4,
             70000000: 4,
             40000000: 5,
             120000000: 1,
             80000000: 1,
             540000000: 1,
             260000000: 1,
             220000000: 1,
             940000000: 1,
             270000000: 1,
             130000000: 1,
             190000000: 2,
             140000000: 1,
             20000000: 2})

    The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Summer is $1 Billion.

    In [504]:
    max(profit_3)
    
    Out[504]:
    941214868

    The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Summer is $2 Million.

    In [505]:
    min(profit_3)
    
    Out[505]:
    1927779
    In [482]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 0 <= i <=50000000:print(i)
    
    26604054
    14131551
    8153415
    2669782
    14718173
    36918287
    33102988
    12815212
    7423752
    42892670
    43947950
    12469621
    7657973
    4478084
    41540205
    44168692
    11477345
    1927779
    2548651
    16283563
    8000000
    18912216
    
    In [483]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 50000000 <= i <=100000000:print(i)
    
    60133905
    53273049
    70975239
    70986904
    74830111
    81120329
    67356170
    
    In [484]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 100000000 <= i <=200000000:print(i)
    
    120036382
    132552290
    188120004
    188265198
    143806510
    
    In [485]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 200000000 <= i <=300000000:print(i)
    
    255500000
    216100000
    267142000
    
    In [488]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 500000000 <= i <=600000000:print(i)
    
    544368315
    
    In [491]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 800000000 <= i :print(i)
    
    941214868
    

    Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Autumn. Which will be stored in a dictionary called 'autumn_group'.

    In [492]:
    autumn_group = []
    for i in profit_4:autumn_group.append(round_to_multiple(i,50000000))
    
    collections.Counter(autumn_group)
    
    Out[492]:
    Counter({300000000: 3,
             0: 43,
             150000000: 5,
             50000000: 22,
             400000000: 1,
             600000000: 1,
             100000000: 2,
             200000000: 1})

    The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Autumn is $600 Million.

    In [506]:
    max(profit_4)
    
    Out[506]:
    583698673

    The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Autumn is $8,404.

    In [507]:
    min(profit_4)
    
    Out[507]:
    8404
    In [493]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_4:
        if 0 <= i <=50000000:print(i)
    
    19966854
    13147416
    9898681
    17017873
    8270399
    23262783
    23830713
    31043521
    12417298
    12499242
    222016
    17033227
    35669037
    9295324
    4328516
    19282640
    4438911
    48766923
    1500000
    2000000
    47784
    4609597
    4344615
    6741732
    34605762
    32973297
    48954968
    5164458
    31440294
    11587135
    35099643
    23794409
    5850377
    35552675
    22004627
    15059418
    2281732
    20044909
    20069303
    10369708
    33185884
    4152584
    3478400
    8404
    18912216
    15465835
    15390895
    1315026
    2311944
    3664240
    1038916
    3546453
    958404
    
    In [494]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_4:
        if 50000000 <= i <=100000000:print(i)
    
    54735925
    69233867
    59068724
    58985708
    52500000
    77551594
    58660270
    57086711
    51076141
    72831866
    52091915
    50167430
    50167430
    
    In [495]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_4:
        if 100000000 <= i <=200000000:print(i)
    
    129558438
    150297525
    163591522
    156127894
    122498338
    136567581
    
    In [496]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_4:
        if 200000000 <= i <=300000000:print(i)
    
    284604712
    285937718
    201120004
    
    In [499]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 500000000 <= i <=600000000:print(i)
    
    544368315
    
    In [500]:
    #30,000,000-100,000,000 (#10)(20%)
    for i in profit_3:
        if 600000000 <= i :print(i)
    
    941214868
    

    3. Conclusion: Predictive Analysis

    Section 1: This section was created to question whether there is a correlation between the Budget, Opening Weekend and Profit of movies in the Drama Genre.

    A 3D scatter plot of the Profit, Budget
    and Opening Weekend Of Drama Movies.

    A 3D scatter plot and linear plane of the Profit, Budget
    and Opening Weekend Of Drama Movies.

    Section 2: The purpose of this section is to partition the movies in the Drama Genre into clusters, based on the month the movie was realesed. Then based on the clusters calculate the probability of generating a certain amount of Revenue based on the Budget spent on the movie.

    A 3D scatter plot of the Budget, Month Released
    and Revenue Of Drama Movies.

    A 3D scatter plot and two clusters of the Budget,
    Month Released and Revenue Of Drama Movies.

    Section 3: The purpose of this section is to partition the movies in the Drama Genre into clusters based on the season the movie was realesed. Then based on the clusters calculate the probability of generating a certain amount of Profit basd on the Opening Weekend generated of the movie.

    A 3D scatter plot of the Opening Weekend, Profit
    and Season Of Drama Movies.

    A 3D scatter plot and two clusters of the Opening Weekend,
    Profit and Season Of Drama Movies.

    Section 4: The purpose of this section is to partition the movies in the Drama Genre into clusters based on the season the movie was realesed. Then based on the clusters calculate the probability of generating a certain amount of Opening Weekend based on the Budget spent of the movie.

    A 4D scatter plot of the Budget, Season
    Month Realesed and Opening Weekend Of Drama Movies.

    A 3D scatter plot and four clusters of the Budget, Season,
    Month Realesed and Opening Weekend Of Drama Movies.

    Section 5: The purpose of this section is to partition the movies in the Drama Genre into clusters based on the season the movie was realesed. Then based on the clusters calculate the probability of generating a certain amount of Profit, the total amount of Profit made in each season, the Average Profit made per season and which season is the most consistant in generating Profit.

    A 3D scatter plot of the Season, Budget
    and Profit Of Drama Movies.

    A 3D scatter plot and four clusters of the Season,
    Budget and Profit Of Drama Movies.

    Creating chart for the 3d chart and showing it above

    In [644]:
    %%js
    Highcharts.chart('container111', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The budget used for each movie are between 42 to 100 million'
        },
        xAxis: {
            categories: ['50-80 Million','80-100 Million', '100-150 Million', '150-200 Million', '200-250 Million',
                         '250-350 Million', '350-460 Million'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Revenue: (millions)|Amount of Movies: 41',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#C41E3A",
                data: [
                    {
                        name: "50-80 Million",
                        y: 7,
                        drilldown: "50-80 Million"
                    },
                    {
                        name: "80-100 Million",
                        y: 12,
                        drilldown: "80-100 Million"
                    },
                    {
                        name: "100-150 Million",
                        y: 24,
                        drilldown: "100-150 Million"
                    },
                    {
                        name: "150-200 Million",
                        y: 20,
                        drilldown: "150-200 Million"
                    },
                    {
                        name: "200-250 Million",
                        y: 5,
                        drilldown: "200-250 Million"
                    },
                    {
                        name: "250-350 Million",
                        y: 15,
                        drilldown: "250-350 Million"
                    },
                    {
                        name: "350-460 Million",
                        y: 17,
                        drilldown: "350-460 Million"
                    },
                ]
            }
        ]
    });
    
    In [645]:
    %%js
    Highcharts.chart('container222', {
        chart: {
            type: 'bar',
            width: 295,
            height: 370
        },
        title:{text:''},
        subtitle: {
            text: 'Group two: The budget used for each movie are between 100 to 400 million'
        },
        xAxis: {
            categories: ['100-150 Million', '150-200 Million', '200-250 Million', '250-350 Million', 
                         '350-450 Million', '450-500 Million', '500-650 Million', '650-800 Million',
                          '800 Million -1.5 Billion'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Revenue: (millions-billions)|Amount of Movies: 51',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#C41E3A",
                data: [
                    {
                        name: "100-150 Million",
                        y: 2,
                        drilldown: "100-150 Million"
                    },
                    {
                        name: "150-200 Million",
                        y: 4,
                        drilldown: "150-200 Million"
                    },
                    {
                        name: "200-250 Million",
                        y: 19,
                        drilldown: "200-250 Million"
                    },
                    {
                        name: "250-350 Million",
                        y: 6,
                        drilldown: "250-350 Million"
                    },
                    {
                        name: "350-450 Million",
                        y: 15,
                        drilldown: "350-450 Million"
                    },
                    {
                        name: "450-500 Million",
                        y: 9,
                        drilldown: "450-500 Million"
                    },
                    {
                        name: "500-650 Million",
                        y: 13,
                        drilldown: "500-650 Million"
                    },
                    {
                        name: "650-800 Million",
                        y: 11,
                        drilldown: "650-800 Million"
                    },
                    {
                        name:"800 Million-1.5 Billion",
                        y: 21,
                        drilldown: "800 Million-1.5 Billion"
                    },
                ]
            }
        ]
    });
    
    In [646]:
    %%js
    Highcharts.chart('container333', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The budget used for each movie are between 50 to 100 million'
        },
        xAxis: {
            categories: ['60-80 Million', '80-100 Million','100-150 Million', '150-200 Million', '200-250 Million', 
                         '250-350 Million', '350-450 Million', '450-520 Million'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Revenue: (millions)|Total of Movies: 48',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#702963",
                data: [
                    {
                        name: "60-80 Million",
                        y: 10,
                        drilldown: "60-80 Million"
                    },
                    {
                        name: "80-100 Million",
                        y: 8,
                        drilldown: "80-100 Million"
                    },
                    {
                        name: "100-150 Million",
                        y: 15,
                        drilldown: "100-150 Million"
                    },
                    {
                        name: "150-200 Million",
                        y: 21,
                        drilldown: "150-200 Million"
                    },
                    {
                        name: "200-250 Million",
                        y: 15,
                        drilldown: "200-250 Million"
                    },
                    {
                        name: "250-350 Million",
                        y: 15,
                        drilldown: "250-350 Million"
                    },
                    {
                        name: "350-450 Million",
                        y: 10,
                        drilldown: "350-450 Million"
                    },
                    {
                        name:"450-520 Million",
                        y: 6,
                        drilldown: "450-520 Million"
                    },
                ]
            }
        ]
    });
    
    In [647]:
    %%js
    Highcharts.chart('container444', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group two: The budget used for each movie are between 100 to 350 million'
        },
        xAxis: {
            categories: [ '100-150 Million', '150-200 Million','200-250 Million',  '300-350 Million',
                         '350-450 Million', '550-650 Million', '650-800 Million', '800 Million-2.2 Billion'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Revenue: (millions-billions)|Total of Movies: 24',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#702963",
                data: [{
                        name: "100-150 Million",
                        y: 8,
                        drilldown: "100-150 Million"
                    },
                    {
                        name: "150-200 Million",
                        y: 12,
                        drilldown: "150-200 Million"
                    },
                    {
                        name: "200-250 Million",
                        y: 4,
                        drilldown: "200-250 Million"
                    },
                    {
                        name: "300-350 Million",
                        y: 4,
                        drilldown: "300-350 Million"
                    },
                    {
                        name: "350-450 Million",
                        y: 17,
                        drilldown: "350-450 Million"
                    },
                    {
                        name: "550-650 Million",
                        y: 8,
                        drilldown: "550-650 Million"
                    },
                    {
                        name: "650-800 Million",
                        y: 21,
                        drilldown: "650-800 Million"
                    },
                    {
                        name:"800 Million-2.2 Billion",
                        y: 25,
                        drilldown: "800 Million-2.2 Billion"
                    },
                ]
            }
        ]
    });
    
    In [648]:
    %%js
    Highcharts.chart('container555', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: '  '
        },
        subtitle: {
            text: 'Group one: The Opening Weekend of each movie that are between 4 to 50 million'
        },
        xAxis: {
            categories: ['10-50 Million','50-100 Million', '100-150 Million', '150-200 Million', '200-250 Million', '250-300 Million','300-450 Million', '2 Billion'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions-billions)|Amount of Movies: 67',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ff4500",
                data: [{
                        name: "10-50 Million",
                        y: 20,
                        drilldown: "10-50 Million"
                    },
                    {
                        name: "50-100 Million",
                        y: 27,
                        drilldown: "50-100 Million"
                    },
                    {
                        name: "100-150 Million",
                        y: 19,
                        drilldown: "100-150 Million"
                    },
                    {
                        name: "150-200 Million",
                        y: 10,
                        drilldown: "150-200 Million"
                    },
                    {
                        name: "200-250 Million",
                        y: 8,
                        drilldown: "200-250 Million"
                    },
                    {
                        name: "250-300 Million",
                        y: 6,
                        drilldown: "250-300 Million"
                    },
                    {
                        name: "300-450 Million",
                        y: 6,
                        drilldown: "300-450 Million"
                    },
                    {
                        name: "2 Billion",
                        y: 2,
                        drilldown: "2 Billion"
                    },
                    
                ]
            }
        ]
    });
    
    In [649]:
    %%js
    Highcharts.chart('container666', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: ' '
        },
        subtitle: {
            text: 'Group two: The Opening Weekend of each movie that are between 50 to 250 million'
        },
        xAxis: {
            categories: ['200-300 Million', '300-350 Million', '400-450 Million', '500-550 Million', '700-800 Million', '800-900 Million',
                           '900 Million-2 Billion'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions-billions)|Amount of Movies: 21',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ff4500",
                data: [
                    {
                        name: "200-300 Million",
                        y: 24,
                        drilldown: "200-300 Million"
                    },
                    {
                        name: "300-350 Million",
                        y: 24,
                        drilldown: "300-350 Million"
                    },
                    {
                        name: "400-450 Million",
                        y: 10,
                        drilldown: "400-450 Million"
                    },
                    {
                        name: "500-550 Million",
                        y: 10,
                        drilldown: "500-550 Million"
                    },
                    {
                        name: "700-800 Million",
                        y: 10,
                        drilldown: "700-800 Million"
                    },
                    {
                        name: "800-900 Million",
                        y: 10,
                        drilldown: "800-900 Million"
                    },
                    {
                        name: "900 Million -2 Billion",
                        y: 14,
                        drilldown: "900 Million-2 Billion"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [650]:
    %%js
    Highcharts.chart('container777', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The Opening Weekend of each movie that are between 10 to 50 million'
        },
        xAxis: {
            categories: ['10-50 Million', '50-100 Million','100-150 Million','150-200 Million',
                         '200-250 Million','250-300 Million', '300-450 Million', '550 Million', 
                           '1.3 Billion'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions-billions)|Amount of Movies: 51',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#960018 ",
                data: [
                    {
                        name: "10-50 Million",
                        y: 33,
                        drilldown: "10-50 Million"
                    },
                    {
                        name: "50-100 Million",
                        y: 10,
                        drilldown: "50-100 Million"
                    },
                    {
                        name: "100-150 Million",
                        y: 20,
                        drilldown: "100-150 Million"
                    },
                    {
                        name: "150-200 Million",
                        y: 12,
                        drilldown: "150-200 Million"
                    },
                    {
                        name: "200-250 Million",
                        y: 2,
                        drilldown: "200-250 Million"
                    },
                    {
                        name: "250-300 Million",
                        y: 6,
                        drilldown: "250-300 Million"
                    },
                    {
                        name: "300-450 Million",
                        y: 14,
                        drilldown: "300-450 Million"
                    },
                    {
                        name: "550 Million",
                        y: 2,
                        drilldown: "550 Million"
                    },
                    {
                        name: "1.3 Billion",
                        y: 2,
                        drilldown: "1.3 Billion"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [651]:
    %%js
    Highcharts.chart('container888', {
        chart: {
            type: 'bar',
            width: 290,
            height: 370
        },
        title: {
            text: ' '
        },
        subtitle: {
            text: 'Group Two: The Opening Weekend of each movie that are between 50 to 380 million'
        },
        xAxis: {
            categories: [ '100-200 Million','200-340 Million', '350-500 Million',
                         '500-550 Million', '550-650 Million','650-750 Million', 
                           '1.1 Billion'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions-billions)|Amount of Movies: 24',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#960018",
                data: [{
                        name: "100-200 Million",
                        y: 13,
                        drilldown: "100-200 Million"
                    },
                    {
                        name: "200-340 Million",
                        y: 21,
                        drilldown: "200-340 Million"
                    },
                    {
                        name: "350-500 Million",
                        y: 25,
                        drilldown: "350-500 Million"
                    },
                    {
                        name: "500-550 Million",
                        y: 8,
                        drilldown: "500-550 Million"
                    },
                    {
                        name: "550-650 Million",
                        y: 21,
                        drilldown: "550-650 Million"
                    },
                    {
                        name: "650-750 Million",
                        y: 12,
                        drilldown: "650-750 Million"
                    },
                    {
                        name: "1.1 Billion",
                        y: 4,
                        drilldown: "1.1 Billion"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [652]:
    %%js
    Highcharts.chart('container999', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The budget of each movie that are between 50 to 80 million'
        },
        xAxis: {
            categories: ['3.5 Million', '10-15 Million','15-20 Million','20-32 Million',],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 16',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ff9999",
                data: [
                    {
                        name: "3.5 Million",
                        y: 6,
                        drilldown: "3.5 Million"
                    },
                    {
                        name: "10-15 Million",
                        y: 10,
                        drilldown: "10-15 Million"
                    },
                    {
                        name: "15-20 Million",
                        y: 20,
                        drilldown: "15-20 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 12,
                        drilldown: "20-30 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [653]:
    %%js
    Highcharts.chart('container991', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group two: The Budget of each movie that are between 80 to 320 million'
        },
        xAxis: {
            categories: ['8-10 Million', '10-15 Million','20-30 Million','30-40 Million',
                         '50-70 Million'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 18',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ff9999",
                data: [
                    {
                        name: "8-10 Million",
                        y: 11,
                        drilldown: "8-10 Million"
                    },
                    {
                        name: "10-15 Million",
                        y: 22,
                        drilldown: "10-15 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 28,
                        drilldown: "20-30 Million"
                    },
                    {
                        name: "30-40 Million",
                        y: 11,
                        drilldown: "30-40 Million"
                    },
                    {
                        name: "50-70 Million",
                        y: 17,
                        drilldown: "50-70 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [654]:
    %%js
    Highcharts.chart('container992', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The Budget of each movie that are between 50 to 80 million'
        },
        xAxis: {
            categories: ['3-6 Million', '10-15 Million','15-20 Million','20-30 Million','30-40 Million',
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 12',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ff1919",
                data: [{
                        name: "3.6 Million",
                        y: 8,
                        drilldown: "3.6 Million"
                    },
                    {
                        name: "10-15 Million",
                        y: 25,
                        drilldown: "10-15 Million"
                    },
                    {
                        name: "15-20 Million",
                        y: 17,
                        drilldown: "15-20 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 33,
                        drilldown: "20-30 Million"
                    },
                    {
                        name: "30-40 Million",
                        y: 17,
                        drilldown: "30-40 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [655]:
    %%js
    Highcharts.chart('container993', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title:{text:''},
        subtitle: {
            text: 'Group two: The Budget of each movie that are between 80 to 400 million'
        },
        xAxis: {
            categories: ['6-20 Million','20-30 Million','30-40 Million',
                         '40-70 Million', '70-90 Million','90-100 Million'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 33',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ff1919",
                data: [
                    {
                        name: "6-20 Million",
                        y: 12,
                        drilldown: "6-20 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 21,
                        drilldown: "20-30 Million"
                    },
                    {
                        name: "30-40 Million",
                        y: 12,
                        drilldown: "30-40 Million"
                    },
                    {
                        name: "40-70 Million",
                        y: 21,
                        drilldown: "40-70 Million"
                    },
                    {
                        name: "70-90 Million",
                        y: 12,
                        drilldown: "70-90 Million"
                    },
                    {
                        name: "90-100 Million",
                        y: 6,
                        drilldown: "90-100 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [656]:
    %%js
    Highcharts.chart('container994', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The Budget of each movie that are between 50 to 90 million'
        },
        xAxis: {
            categories: [ '10-15 Million','15-20 Million','20-30 Million','30-40 Million','50-70 Million'
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 28',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#990000",
                data: [
                    {
                        name: "10-15 Million",
                        y: 14,
                        drilldown: "10-15 Million"
                    },
                    {
                        name: "15-20 Million",
                        y: 21,
                        drilldown: "15-20 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 32,
                        drilldown: "20-30 Million"
                    },
                    {
                        name: "30-40 Million",
                        y: 11,
                        drilldown: "30-40 Million"
                    },
                    {
                        name: "50-70 Million",
                        y: 11,
                        drilldown: "50-70 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [657]:
    %%js
    Highcharts.chart('container995', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title:{text:''},
        subtitle: {
            text: 'Group two: The Budget of each movie that are between 90 to 200 million'
        },
        xAxis: {
            categories: ['9-20 Million','20-30 Million','30-40 Million','40-50 Million',
                         '50-70 Million', '70-80 Million','90-140 Million'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 29',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#990000",
                data: [
                    {
                        name: "9-20 Million",
                        y: 10,
                        drilldown: "9-20 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 28,
                        drilldown: "20-30 Million"
                    },
                    {
                        name: "30-40 Million",
                        y: 7,
                        drilldown: "30-40 Million"
                    },
                    {
                        name: "40-50 Million",
                        y: 7,
                        drilldown: "40-50 Million"
                    },
                    {
                        name: "50-70 Million",
                        y: 21,
                        drilldown: "50-70 Million"
                    },
                    {
                        name: "70-80 Million",
                        y: 7,
                        drilldown: "70-80 Million"
                    },
                    {
                        name: "90-140 Million",
                        y: 21,
                        drilldown: "90-140 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [658]:
    %%js
    Highcharts.chart('container996', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title: {
            text: ''
        },
        subtitle: {
            text: 'Group one: The Budget of each movie that are between 50 to 80 million'
        },
        xAxis: {
            categories: [ '8-12 Million','20-30 Million','30-40 Million','40-50 Million'
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 16',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#4d0000",
                data: [
                    {
                        name: "8-12 Million",
                        y: 19,
                        drilldown: "8-12 Million"
                    },
                    {
                        name: "20-30 Million",
                        y: 35,
                        drilldown: "20-30 Million"
                    },
                    {
                        name: "30-40 Million",
                        y: 19,
                        drilldown: "30-40 Million"
                    },
                    {
                        name: "40-50 Million",
                        y: 6,
                        drilldown: "40-50 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [659]:
    %%js
    Highcharts.chart('container997', {
        chart: {
            type: 'bar',
            width: 290,
            height: 230
        },
        title:{text:''},
        subtitle: {
            text: 'Group two: The Budget of each movie that are between 80 to 300 million'
        },
        xAxis: {
            categories: ['9-20 Million','20-30 Million','30-40 Million','40-50 Million',
                         '50-70 Million', '70-80 Million','90-140 Million'],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Opening Weekend: (millions)|Amount of Movies: 14',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#4d0000",
                data: [
                    {
                        name: "3-13 Million",
                        y: 21,
                        drilldown: "3-13 Million"
                    },
                    {
                        name: "20-35 Million",
                        y: 21,
                        drilldown: "20-35 Million"
                    },
                    {
                        name: "50-60 Million",
                        y: 14,
                        drilldown: "50-60 Million"
                    },
                    {
                        name: "70-90 Million",
                        y: 21,
                        drilldown: "70-90 Million"
                    },
                    {
                        name: "90-130 Million",
                        y: 21,
                        drilldown: "90-130 Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [660]:
    %%js
    function dollarFormat(x) {
        return '$' + Highcharts.numberFormat(x, 0, '.', ',');
    }
    
    var colors = Highcharts.getOptions().colors;
    
    Highcharts.chart('container998', {
        chart: {
            type: 'column',
            inverted: false,
            height: 400
        },
    
        accessibility: {
            series: {
                descriptionFormatter: function (series) {
                    return series.type === 'line' ?
                        series.name + ', ' + dollarFormat(series.points[0].y) :
                        series.name + ' grant amounts, bar series with ' +
                        series.points.length + ' bars.';
                }
            },
            point: {
                valuePrefix: '$'
            },
            keyboardNavigation: {
                seriesNavigation: {
                    mode: 'serialize'
                }
            }
        },
    
        title: {
            text: 'The total profit of movies in the Drama genre that were released in each season',
            margin: 35
        },
    
        subtitle: {
            text: 'There are four seasons in a year: Winter(December, Janurray, Feburary), Spring(March, April, May), Summer(June, July, August), Autumn(September, October, Novemeber)'
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
    
        xAxis: {
            visible: false,
            accessibility: {
                description: 'Grant applicants',
                rangeDescription: ''
            }
        },
    
        yAxis: [{
            min: 0,
            max: 2000000000,
            labels: {
                format: '${text}'
            },
            title: {
                text: 'Grant amount'
            },
            gridLineWidth: 1
        }, {
            accessibility: {
                description: 'Indivisual Movie total'
            },
            opposite: true,
            min: 0,
            max: 14000000000,
            gridLineWidth: 0,
            labels: {
                format: '${text}',
                style: {
                    color: '#8F6666'
                }
            },
            title: {
                text: 'Season total',
                style: {
                    color: '#8F6666'
                }
            }
        }],
    
        credits: {
            enabled: false
        },
    
        plotOptions: {
            column: {
                keys: ['name', 'y'],
                grouping: false,
                pointPadding: 0.1,
                groupPadding: 0,
                tooltip: {
                    headerFormat: '<span style="font-size: 10px">' +
                        '<span style="color:{point.color}">\u25CF</span> ' +
                        '{series.name}</span><br/>',
                    pointFormat: '{point.name}: <b>${point.y:,.0f}</b><br/>'
                }
            },
            line: {
                yAxis: 1,
                lineWidth: 5,
                accessibility: {
                    exposeAsGroupOnly: true
                },
                marker: {
                    enabled: false
                },
                enableMouseTracking: false,
                linkedTo: ':previous',
                dataLabels: {
                    enabled: true,
                    verticalAlign: 'bottom',
                    style: {
                        color: '#757575',
                        fontWeight: 'normal'
                    },
                    formatter: function () {
                        if (this.point === this.series.points[Math.floor(
                            this.series.points.length / 2
                        )]) {
                            return 'Total: $' + Highcharts.numberFormat(this.y, 0);
                        }
                    }
                }
            }
        },
    
        responsive: {
            rules: [{
                condition: {
                    maxWidth: 400
                },
                chartOptions: {
                    chart: {
                        spacingLeft: 3,
                        spacingRight: 5
                    },
                    yAxis: [{}, {
                        visible: false
                    }]
                }
            }]
        },
    
        series: [{
            name: ' Winter',
            color: "#ed93cd",
            borderColor: '#A59273',
            borderWidth: 1,
            data: [
               [ ' The Adventures of Tintin ' , 243993951 ],
               [ ' Spider-Man: Into The Spider-Verse 3D ' , 285381768 ],
               [ ' Alvin and the Chipmunks: The Road Chip ' , 159517956 ],
               [ ' Star Wars Ep. VII: The Force Awakens ' , 1747311220 ],
               [ ' Fool\'s Gold ' , 36862966 ],
               [ ' Alvin and the Chipmunks: The Squeakquel ' , 373483213 ],
               [ ' Hook ' , 230854823 ],
               [ ' Rumor Has It ' , 18933562 ],
               [ ' Hall Pass ' , 19173475 ],
               [ ' Titanic ' , 2008208395 ],
               [ ' Aquaman ' , 986894640 ],
               [ ' Edge of Darkness ' , 22812456 ],
               [ ' It\'s Complicated ' , 139614744 ],
               [ ' Alvin and the Chipmunks: Chipwrecked ' , 269088523 ],
               [ ' The Tale of Despereaux ' , 30482317 ],
               [ ' Seven Pounds ' , 112617328 ],
               [ ' Stepmom ' , 109745279 ],
               [ ' Sherlock Holmes ' , 408438212 ],
               [ ' Escape Plan ' , 33735965 ],
               [ ' Les Miserables ' , 377169052 ],
               [ ' Unbroken ' , 98527824 ],
               [ ' Broken Arrow ' , 83345997 ],
               [ ' The Hateful Eight ' , 85864886 ],
               [ ' Kangaroo Jack ' , 30723216 ],
               [ ' Star Wars Ep. VIII: The Last Jedi ' , 999721747 ],
               [ ' King Kong ' , 343517357 ],
               [ ' Mission: Impossible—Ghost Protocol ' , 549713230 ],
               [ ' Happy Feet Two ' , 22956466 ],
               [ ' Australia ' , 85080810 ],
               [ ' Blood Diamond ' , 71377916 ],
               [ ' The Girl with the Dragon Tattoo ' , 149373970 ],
               [ ' Valkyrie ' , 113932174 ],
               [ ' Ocean\'s Eleven ' , 365728529 ],
               ]
        }, {
            type: 'line',
            name: ' Winter',
            data: [
                10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
                10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
                10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
                10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
                10614183967
            ],
            color: "#ed93cd"
        }, {
            name: ' Spring',
            color: "#DA70D6",
            data: [
                [ ' Pirates of the Caribbean: On Stranger Tides ' , 635063875 ],
                [ ' Avengers: Age of Ultron ' , 1072413963 ],
                [ ' Pirates of the Caribbean: At World\'s End ' , 663420425 ],
                [ ' Solo: A Star Wars Story ' , 118151347 ],
                [ ' Pirates of the Caribbean: Dead Men Tell No Tales ' , 558241137 ],
                [ ' Indiana Jones and the Kingdom of the Crystal Skull ' , 601635413 ],
                [ ' Shrek the Third ' , 647330936 ],
                [ ' Dark Shadows ' , 88202668 ],
                [ ' The Croods ' , 438068425 ],
                [ ' Logan ' , 488461394 ],
                [ ' Gladiator ' , 354683805 ],
                [ ' Wonder Park ' , 15149422 ],
                [ ' Die Hard: With a Vengeance ' , 276101666 ],
                [ ' Tomb Raider ' , 183477501 ],
                [ ' Divergent ' , 191014965 ],
                [ ' Tomorrowland ' , 36627518 ],
                [ ' Kung Fu Panda 3 ' , 377599142 ],
                [ ' The Day After Tomorrow ' , 431319450 ],
                [ ' Power Rangers ' , 22531552 ],
                [ ' Kingdom of Heaven ' , 108853353 ],
                [ ' The Sum of All Fears ' , 125500000 ],
                [ ' The Dictator ' , 115148897 ],
                [ ' Rambo III ' , 130715611 ],
                [ ' The Adjustment Bureau ' , 76731325 ],
                [ ' Inside Man ' , 135798265 ],
                [ ' Fever Pitch ' , 10071069 ],
                [ ' Spider-Man 3 ' , 636860230 ],
                [ ' Thor ' , 299326618 ],
                [ ' Rango ' , 110724600 ],
                [ ' The Mummy Returns ' , 337040395 ],
                [ ' Need for Speed ' , 128169619 ],
                [ ' The Matrix ' , 398517383 ],
                [ ' 300 ' , 394161935 ],
                [ ' Wild Hogs ' , 193555383 ],
                [ ' London Has Fallen ' , 135194085 ],
                [ ' Hellboy ' , 39823958 ],
                [ ' Jack the Giant Slayer ' , 2687603 ],
                [ ' Furious 7 ' , 1328722794 ],
                [ ' Star Trek Into Darkness ' , 277381584 ],
                [ ' Monsters vs. Aliens ' , 206687380 ],
                [ ' Poseidon ' , 21674817 ],
                [ ' Fast Five ' , 505163454 ],
                [ ' Godzilla ' , 251000000 ],
                [ ' Epic ' , 162794441 ],
                [ ' Volcano ' , 30100000 ],
            ],
            pointStart: 36
        }, {
            type: 'line',
            name: ' Spring',
            data: [
                13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
                13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
                13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
                13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
                13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
                13361899403,13361899403,13361899403,13361899403,13361899403
            ],
            pointStart: 36,
            color:"#DA70D6"
        }, {
            name: ' Summer',
            color: "#800080",
            data: [
                [ ' Pirates of the Caribbean: Dead Man\'s Chest ' , 841215812 ],
                [ ' Pacific Rim ' , 221002906 ],
                [ ' Spider-Man: Homecoming ' , 705166350 ],
                [ ' Transformers ' , 557272592 ],
                [ ' The Last Airbender ' , 169713881 ],
                [ ' Skyscraper ' , 179115534 ],
                [ ' Public Enemies ' , 109782709 ],
                [ ' Rush Hour 2 ' , 257425832 ],
                [ ' Percy Jackson: Sea of Monsters ' , 110859554 ],
                [ ' The Bourne Supremacy ' , 226001124 ],
                [ ' Fast & Furious ' , 278064265 ],
                [ ' Man of Steel ' , 442999518 ],
                [ ' Mission: Impossible Rogue Nation ' , 538858992 ],
                [ ' Atlantis: The Lost Empire ' , 96049020 ],
                [ ' Who Framed Roger Rabbit? ' , 281500000 ],
                [ ' Ocean\'s 8 ' , 227115976 ],
                [ ' AVP: Alien Vs. Predator ' , 102543519 ],
                [ ' Total Recall ' , 196400000 ],
                [ ' Step Brothers ' , 63468793 ],
                [ ' Pete\'s Dragon ' , 72768975 ],
                [ ' Salt ' , 160650494 ],
                [ ' Die Hard 2 ' , 169814025 ],
                [ ' World Trade Center ' , 98295654 ],
                [ ' The Dark Tower ' , 53461527 ],
                [ ' Planes: Fire and Rescue ' , 106399644 ],
                [ ' Bedazzled ' , 42376224 ],
                [ ' Cleopatra ' , 29000000 ],
                [ ' Legal Eagles ' , 9851591 ],
                [ ' The Skeleton Key ' , 52256918 ],
                [ ' The Mummy: Tomb of the Dragon Emperor ' , 230760225 ],
                [ ' The Sorcerer\'s Apprentice ' , 57986320 ],
                [ ' Harry Potter and the Order of the Phoenix ' , 793076457 ],
                [ ' The Bourne Ultimatum ' , 314043396 ],
                [ ' Prometheus ' , 277448265 ],
                [ ' RoboCop ' , 122981799 ],
                [ ' The Smurfs ' , 453749323 ],
                [ ' Seabiscuit ' , 62715342 ],
                [ ' Abraham Lincoln: Vampire Hunter ' , 69989730 ],
                [ ' Space Cowboys ' , 63874043 ],
                [ ' Death Race ' , 7516819 ],
                [ ' 2 Guns ' , 71493015 ],
                [ ' War for the Planet of the Apes ' , 337592267 ],
                [ ' Charlie and the Chocolate Factory ' , 325825484 ],
                [ ' Ghostbusters ' , 85008658 ],
                [ ' Harry Potter and the Sorcerer\'s Stone ' , 850047606 ],
                [ ' The Wolverine ' , 301456852 ],
                [ ' The Patriot ' , 105300000 ],
                [ ' True Lies ' , 265300000 ],
                [ ' Point Break ' , 26704591 ],
                [ ' Artificial Intelligence: AI ' , 145900000 ],
                [ ' Fantastic Four ' , 245632750 ]
            ],
            pointStart: 83
        }, {
            type: 'line',
            name: ' Summer',
            data: [
                11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
                11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
                11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
                11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
                11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
                11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
                11613834371,11613834371,11613834371
            ],
            pointStart: 83,
            color: "#800080"
        }, {
            name: ' Autumn',
            color: "#780d53",
            data: [
                [ ' Justice League ' , 355945209 ],
                [ ' Toy Story 2 ' , 421358276 ],
                [ ' Thor: Ragnarok ' , 666980024 ],
                [ ' Moana ' , 487517365 ],
                [ ' The World is Not Enough ' , 226730660 ],
                [ ' The Twilight Saga: Breaking Dawn, Part 1 ' , 561920051 ],
                [ ' Ender\'s Game ' , 17983283 ],
                [ ' The Departed ' , 199660619 ],
                [ ' The Kingdom ' , 14009602 ],
                [ ' Sleepy Hollow ' , 137068340 ],
                [ ' The Boxtrolls ' , 51946251 ],
                [ ' First Man ' , 45203825 ],
                [ ' Shark Tale ' , 296917043 ],
                [ ' Daddy\'s Home 2 ' , 105807183 ],
                [ ' Murder on the Orient Express ' , 290922730 ],
                [ ' The Peacemaker ' , 12967368 ],
                [ ' The One ' , 23689126 ],
                [ ' The Intern ' , 157115710 ],
                [ ' Allied ' , 13266661 ],
                [ ' I, Frankenstein ' , 9575290 ],
                [ ' Money Train ' , 9224232 ],
                [ ' Everest ' , 156297061 ],
                [ ' Gone Girl ' , 307567189 ],
                [ ' Jack Reacher: Never Go Back ' , 99946489 ],
                [ ' The Jackal ' , 99356941 ],
                [ ' Hugo ' , 47784 ],
                [ ' Doctor Strange ' , 511404566 ],
                [ ' Big Hero 6 ' , 487127828 ],
                [ ' Harry Potter and the Goblet of Fire ' , 747099794 ],
                [ ' Bolt ' , 178015029 ],
            ],
            pointStart: 137
        }, {
            type: 'line',
            name: 'Autumn',
            data: [
                6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,
                6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,
                6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,
                6692671529,6692671529,6692671529,6692671529,6692671529,6692671529
            ],
            pointStart: 137,
            color: "#780d53"
        }]
    });
    
    In [661]:
    %%js
    Highcharts.chart('container101', {
        chart: {
            type: 'bar',
            width: 290,
            height: 330
        },
        title: {
            text: 'Winter'
        },
        subtitle: {
            text: 'The profit of the movies thata were released in the winter'
        },
        xAxis: {
            categories: [ '20-50 Million','50-100 Million','100-200 Million','200-300 Million',
                         '300-400 Million','400-550 Million','900 Million-2.1 Billion'
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions-Billions)|Amount of Movies: 33',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#ed93cd",
                data: [
                    {
                        name: "20-50 Million",
                        y: 24,
                        drilldown: "20-50 Million"
                    },
                    {
                        name: "50-100 Million",
                        y: 15,
                        drilldown: "50-100 Million"
                    },
                    {
                        name: "100-200 Million",
                        y: 18,
                        drilldown: "100-200 Million"
                    },
                    {
                        name: "200-300 Million",
                        y: 12,
                        drilldown: "200-300 Million"
                    },
                    {
                        name: "300-400 Million",
                        y: 12,
                        drilldown: "300-400 Million"
                    },
                    {
                        name: "400-550 Million",
                        y: 6,
                        drilldown: "400-550 Million"
                    },
                    {
                        name: "900 Million-2.1 Billion",
                        y: 12,
                        drilldown: "900 Million-2.1 Billion"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [662]:
    %%js
    Highcharts.chart('container102', {
        chart: {
            type: 'bar',
            width: 290,
            height: 330
        },
        title: {
            text: 'Spring'
        },
        subtitle: {
            text: 'The profit of the movies thata were released in the Spring'
        },
        xAxis: {
            categories: [ '3-40 Million','50-100 Million','100-200 Million','200-300 Million',
                         '300-400 Million','400-500 Million','500-670 Million','1-1.3 Billion'
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions-Billions)|Amount of Movies: 45',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#DA70D6",
                data: [
                    {
                        name: "3-40 Million",
                        y: 18,
                        drilldown: "3-40 Million"
                    },
                    {
                        name: "50-100 Million",
                        y: 4,
                        drilldown: "50-100 Million"
                    },
                    {
                        name: "100-200 Million",
                        y: 29,
                        drilldown: "100-200 Million"
                    },
                    {
                        name: "200-300 Million",
                        y: 11,
                        drilldown: "200-300 Million"
                    },
                    {
                        name: "300-400 Million",
                        y: 11,
                        drilldown: "300-400 Million"
                    },
                    {
                        name: "400-500 Million",
                        y: 6,
                        drilldown: "400-500 Million"
                    },
                    {
                        name: "500-670 Million",
                        y: 15,
                        drilldown: "500-670 Million"
                    },
                    {
                        name: "1-1.3 Billion",
                        y: 4,
                        drilldown: "1-1.3 Billion"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [663]:
    %%js
    Highcharts.chart('container103', {
        chart: {
            type: 'bar',
            width: 290,
            height: 330
        },
        title: {
            text: 'Summer'
        },
        subtitle: {
            text: 'The profit of the movies thata were released in the Summer'
        },
        xAxis: {
            categories: [ '8-50 Million','50-100 Million','100-200 Million','200-300 Million',
                         '300-400 Million','400-500 Million','500-600 Million','700-800 Million','800-850 Million'
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions)|Amount of Movies: 51',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#800080",
                data: [
                    {
                        name: "8-50 Million",
                        y: 10,
                        drilldown: "8-50 Million"
                    },
                    {
                        name: "50-100 Million",
                        y: 24,
                        drilldown: "50-100 Million"
                    },
                    {
                        name: "100-200 Million",
                        y: 24,
                        drilldown: "100-200 Million"
                    },
                    {
                        name: "200-300 Million",
                        y: 20,
                        drilldown: "200-300 Million"
                    },
                    {
                        name: "300-400 Million",
                        y: 8,
                        drilldown: "300-400 Million"
                    },
                    {
                        name: "400-500 Million",
                        y: 4,
                        drilldown: "400-500 Million"
                    },
                    {
                        name: "500-600 Million",
                        y: 4,
                        drilldown: "500-600 Million"
                    },
                    {
                    
                        name: "700-800 Million",
                        y: 4,
                        drilldown: "700-800 Million"
                    },
                    {
                        name: "800-850 Million",
                        y: 4,
                        drilldown: "800-850Million"
                    },
        
                    
                ]
            }
        ]
    });
    
    In [664]:
    %%js
    Highcharts.chart('container104', {
        chart: {
            type: 'bar',
            width: 290,
            height: 330
        },
        title: {
            text: 'Autumn'
        },
        subtitle: {
            text: 'The profit of the movies thata were released in the Autumn'
        },
        xAxis: {
            categories: [ '50 Thousand-50 Million','50-100 Million','100-200 Million','200-300 Million',
                         '300-400 Million','400-500 Million','500-600 Million','700-850 Million',
                         
                         ],
            title: {
                text: null
            }
        },
        yAxis: {
            min: 0,
            title: {
                text: 'Profit: (millions)|Amount of Movies: 30',
                align: 'high'
            },
            labels: {
                overflow: 'justify'
            }
        },
        legend: { 
            enabled: false
        },
        tooltip: {
            valueSuffix: '%'
        },
        plotOptions: {
            bar: {
                dataLabels: {
                    enabled: true
                }
            },
            series: {
                borderWidth: 0,
                dataLabels: {
                    enabled: true,
                    format: '{point.y:.1f}%'
                }
            }
        },
        credits: {
            enabled: false
        },
        series: [
            {
                name: "Probability",
                color: "#780d53",
                data: [
                    {
                        name: "50 Thousand-50 Million",
                        y: 30,
                        drilldown: "50 Thousand -50 Million"
                    },
                    {
                        name: "50-100 Million",
                        y: 10,
                        drilldown: "50-100 Million"
                    },
                    {
                        name: "100-200 Million",
                        y: 20,
                        drilldown: "100-200 Million"
                    },
                    {
                        name: "200-300 Million",
                        y: 10,
                        drilldown: "200-300 Million"
                    },
                    {
                        name: "300-400 Million",
                        y: 13,
                        drilldown: "300-400 Million"
                    },
                    {
                        name: "400-500 Million",
                        y: 7,
                        drilldown: "400-500 Million"
                    },
                    {
                        name: "500-600 Million",
                        y: 7,
                        drilldown: "500-600 Million"
                    },
                    {
                    
                        name: "700-850 Million",
                        y: 13,
                        drilldown: "700-850 Million"
                    },
        
                    
                ]
            }
        ]
    });
    

    Blueprint: Revenue of Movies¶

    This is the blueprint for creating the fourth visualization Revenue of Movies, Altair will be used to create this graph.

    Blueprint:

    • The format of the dataframe needed for this graph is the same as the previous datarame that was created.

    • The style of this graph is the Dot Dash Plot which is found in Altairs Gallery. It is a scatter plot with a x-axis and a y-axis. The x-axis shows the revenue of each movie and the y-axis is not visible. The Dot Dash plots are scatter plot with trick marks protraying the minute seperation of the amount of items in each category within the selection. To create a selection create a box by dragging the mouse. When the mouse hovers over the pionts it projects the name, system rating and the revenue of the movie.

    Blueprint: Budget of Movies¶

    This is the blueprint for creating the fivth visualization Budget of Movies, Altair will be used to create this graph.

    Blueprint:

    • The format of the dataframe needed for this graph is the same as the previous datarame that was created.

    • The style of this graph is the Dot Dash Plot which is found in Altairs Gallery. It is a scatter plot with a x-axis and a y-axis. The y-axis shows the Budget of each movie and the x-axis is not visible. The Dot Dash plots are scatter plot with trick marks protraying the minute seperation of the amount of items in each category within the selection. To create a selection create a box by dragging the mouse. When the mouse hovers over the pionts it projects the name, system rating and the budget of the movie.

    This is the 'Drama_DataFrame' dataframe.

    In [665]:
    Drama_DataFrame
    
    Out[665]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    Loading... (need help?)

    Getting the Budget of all the 'R-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [305]:
    budget = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='R':budget.append(Drama_DataFrame.Production_Budget[i])
    print(budget)
    
    [150000000.0, 100000000.0, 68000000.0, 61000000.0, 60000000.0, 55000000.0, 55000000.0, 55000000.0, 52500000.0, 40000000.0, 37500000.0, 35000000.0, 31000000.0, 25000000.0, 23000000.0, 22500000.0, 22500000.0, 22000000.0, 21000000.0, 20000000.0, 20000000.0, 20000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 13000000.0, 13000000.0, 13000000.0, 12000000.0, 12000000.0, 12000000.0, 11800000.0, 11000000.0, 10000000.0, 10000000.0, 9400000.0, 8500000.0, 7000000.0, 7000000.0, 5000000.0, 5000000.0, 4900000.0, 4750000.0, 4000000.0, 4000000.0, 3500000.0, 3400000.0, 3300000.0, 3000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 1987650.0, 1500000.0, 1000000.0, 1000000.0, 1000000.0, 1000000.0, 250000.0, 135000.0, 100000.0, 6000000.0, 8500000.0, 20000000.0, 100000.0, 26000000.0, 6500000.0, 22000000.0, 2700000.0, 11500000.0, 9000000.0]
    

    Getting the Budget of all the 'PG-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [306]:
    budget1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='PG':budget1.append(Drama_DataFrame.Production_Budget[i])
    print(budget1)
    
    [180000000.0, 37000000.0, 31000000.0, 20000000.0, 20000000.0, 3000000.0, 1700000.0, 5100000.0, 10000000.0, 95000000.0, 3000000.0, 20000000.0, 40000000.0, 5000000.0, 422000.0, 5100000.0, 72000000.0, 11800000.0, 15000000.0, 32000000.0, 40000000.0, 65000000.0, 8000000.0, 9000000.0, 17000000.0, 30000000.0, 500000.0, 20000000.0, 11000000.0, 2000000.0, 23000000.0, 45000000.0, 15000000.0, 10000000.0, 32000000.0, 90000000.0, 10000000.0, 27000000.0, 16000000.0, 3000000.0, 15000000.0, 25000000.0, 34000000.0, 10000000.0, 20000000.0, 15000000.0, 12000000.0, 5000000.0, 7000000.0, 14000000.0, 15000000.0, 12000000.0, 28300000.0, 8000000.0, 7500000.0, 17000000.0, 5000000.0, 9000000.0, 15000000.0, 22000000.0, 5000000.0, 4500000.0, 4500000.0, 8000000.0, 16000000.0, 8200000.0, 28000000.0]
    

    Getting the Budget of all the 'G-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [307]:
    budget2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='G':budget2.append(Drama_DataFrame.Production_Budget[i])
    print(budget2)
    
    [35446775.0, 700000.0, 8600000.0, 7000000.0, 18000000.0, 4400000.0, 17000000.0, 22000000.0, 20000000.0, 23000000.0, 15000000.0, 2700000.0, 70000000.0, 30000000.0, 2500000.0, 90000000.0, 666000.0, 85000000.0, 17000000.0, 10000000.0, 22000000.0, 18000000.0, 8200000.0, 60000000.0, 45000000.0, 858000.0, 17000000.0, 300000.0, 10000000.0, 6400000.0, 13000000.0, 1750000.0, 1700000.0, 3000000.0]
    

    Getting the Budget of all the 'PG-13 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [308]:
    budget3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='PG-13':budget3.append(Drama_DataFrame.Production_Budget[i])
    print(budget3)
    
    [110000000.0, 75000000.0, 60000000.0, 60000000.0, 55000000.0, 50000000.0, 50000000.0, 50000000.0, 50000000.0, 50000000.0, 49000000.0, 47000000.0, 44000000.0, 40000000.0, 40000000.0, 40000000.0, 40000000.0, 38000000.0, 37000000.0, 37000000.0, 36000000.0, 35000000.0, 35000000.0, 34000000.0, 33000000.0, 30000000.0, 30000000.0, 30000000.0, 28000000.0, 27500000.0, 26000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 24000000.0, 21000000.0, 20000000.0, 20000000.0, 19000000.0, 18000000.0, 18000000.0, 17000000.0, 17000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 15000000.0, 14000000.0, 13000000.0, 12000000.0, 12000000.0, 11000000.0, 11000000.0, 10000000.0, 10000000.0, 9700000.0, 9000000.0, 9000000.0, 7400000.0, 7000000.0, 6000000.0, 5000000.0, 5000000.0, 5000000.0, 5000000.0, 4500000.0, 4357373.0, 2600000.0, 2000000.0, 1400000.0, 250000.0, 175000.0]
    

    Getting the Budget of all the 'NC-17 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [309]:
    budget4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='NC-17':budget4.append(Drama_DataFrame.Production_Budget[i])
    print(budget4)
    
    [6500000.0, 12500000.0, 1000000.0, 20000.0, 955472.0, 1500000.0, 45000000.0, 9000000.0, 5000000.0, 15000000.0, 2734384.0, 15000000.0, 6500000.0, 4000000.0, 45000000.0, 15000000.0, 6500000.0, 4074940.0, 1000000.0, 1000000.0, 3565572.0, 12000000.0, 10000000.0, 15000000.0, 19000000.0, 350000.0, 1000000.0, 6500000.0, 4700000.0, 904765.0, 3000000.0, 700000.0, 34000000.0, 230000.0, 1000000.0, 3200000.0, 1000000.0, 1500000.0, 6500000.0, 1250000.0, 12000.0, 15000000.0, 2200000.0, 1300000.0, 15000000.0, 6400000.0, 50000.0, 3259572.0, 612072.0]
    

    Getting the Star Ratings of all the 'R-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [310]:
    rating = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='R':rating.append(Drama_DataFrame.Averagerating[i])
    print(rating)
    
    [5.8, 8.4, 5.7, 8.1, 5.7, 4.6, 4.5, 6.5, 7.4, 4.1, 7.1, 7.5, 7.3, 6.2, 7.1, 7.5, 7.1, 5.6, 6.1, 6.9, 7.1, 5.3, 7.5, 6.6, 6.6, 7.1, 6.9, 8.0, 7.7, 8.2, 6.9, 7.2, 6.6, 6.8, 7.2, 6.8, 7.3, 8.7, 7.2, 7.8, 7.5, 7.0, 5.2, 6.4, 8.1, 7.4, 7.9, 6.5, 6.8, 7.1, 8.5, 7.9, 5.3, 7.2, 7.6, 6.2, 7.1, 4.9, 7.0, 6.4, 7.4, 6.9, 6.2, 7.4, 3.8, 6.6, 6.8, 7.7, 6.6, 4.9, 6.3, 6.5, 5.5, 6.5, 6.8, 5.9, 6.8]
    

    Getting the Star Ratings of all the 'PG-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [311]:
    rating1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='PG':rating1.append(Drama_DataFrame.Averagerating[i])
    print(rating1)
    
    [7.5, 6.9, 6.5, 8.0, 6.0, 6.5, 7.8, 7.2, 6.4, 6.9, 6.5, 8.0, 7.8, 6.6, 5.9, 6.1, 6.9, 7.3, 6.6, 6.8, 6.8, 7.1, 7.4, 7.3, 7.1, 7.5, 6.5, 6.0, 6.4, 4.7, 7.3, 6.0, 6.7, 6.1, 6.4, 7.5, 7.2, 6.8, 7.7, 7.5, 7.8, 7.6, 7.2, 7.0, 6.3, 6.9, 7.2, 6.3, 7.3, 6.8, 7.6, 6.9, 7.3, 6.1, 6.0, 6.8, 6.5, 5.7, 6.1, 4.7, 6.9, 7.4, 7.0, 6.1, 6.1, 6.6, 7.5]
    

    Getting the Star Ratings of all the 'G-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [312]:
    rating2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='G':rating2.append(Drama_DataFrame.Averagerating[i])
    print(rating2)
    
    [7.2, 7.6, 7.3, 6.4, 7.3, 7.8, 7.7, 6.9, 8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8, 6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]
    

    Getting the Star Ratings of all the 'PG-13 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [313]:
    rating3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='PG-13':rating3.append(Drama_DataFrame.Averagerating[i])
    print(rating3)
    
    [7.7, 7.1, 6.6, 6.8, 6.4, 7.2, 7.2, 6.5, 6.0, 6.6, 6.6, 7.9, 6.5, 7.6, 7.6, 5.7, 6.0, 6.9, 5.8, 6.0, 6.8, 7.6, 6.8, 7.1, 6.5, 6.8, 7.2, 6.4, 6.7, 6.9, 6.7, 8.1, 6.3, 6.5, 6.5, 6.8, 4.5, 7.2, 6.7, 7.4, 7.2, 7.6, 6.9, 6.6, 6.6, 5.6, 4.9, 7.1, 6.4, 6.3, 7.0, 6.9, 8.0, 6.4, 5.0, 6.8, 7.5, 6.4, 7.4, 7.9, 6.1, 6.6, 4.3, 5.6, 7.1, 6.4, 7.5, 6.4, 7.0, 6.4, 6.5, 7.4, 7.0, 7.6, 6.7, 7.0]
    

    Getting the Star Ratings of all the 'NC-17 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.

    In [314]:
    rating4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if x=='NC-17':rating4.append(Drama_DataFrame.Averagerating[i])
    print(rating4)
    
    [7.2, 7.0, 5.6, 6.0, 5.7, 7.1, 4.9, 6.4, 7.2, 7.2, 5.1, 7.5, 7.2, 7.7, 4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4, 7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]
    

    This is a function called 'Average' that gets the average of a list.

    In [243]:
    def Average(l):
        avg = sum(l) / len(l)
        return avg
    

    Getting the average of all the 'Star Ratings' of each movie in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [327]:
    for i in [rating,rating1,rating2,rating3,rating4]:print(Average(i))
    
    6.744155844155842
    6.8044776119403
    7.311764705882353
    6.7144736842105255
    6.630612244897959
    

    Getting the average of all the 'Budget' of each movie in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [305]:
    for i in [budget,budget1,budget2,budget3,budget4]:print(Average(i))
    
    18335359.09090909
    21651074.62686567
    20182963.970588237
    25577399.64473684
    7479975.040816327
    

    Getting the average of all the 'Budgets' of each movie in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [320]:
    for i in [world_int,world_int1,world_int2,world_int3,world_int4]:print(Average(i))
    
    74913332.64285715
    103158408.56521739
    146873371.52
    104420678.015625
    28266049.647058822
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the Average Budegt, Revenue and Star Ratings of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. This will be done using Javascript and HTML below and will be saved in an .png file.

    In [73]:
    %%html
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/highcharts-more.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    
    <table><tr><th></th><th></th><th></th><th></th></tr><tr><th></th><th></th><th></th><th></th></tr></th><th></th></tr>
        <tr>
        <td><span class="gridMap" id="container6"></span><td>
        <td><span class="gridMap" id="container7"></span><td>
        <td><span class="gridMap" id="container8"></span><td>
      
        
        </tr>
    </table>
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Average Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. A 'Radial Bar Chart' is similar to a bar chart, but the y-axis is circular. This will be done using Javascript and HTML and will be saved in a .png file called 'average-budget-of-all-sy.png'.

    In [74]:
    %%js
    Highcharts.chart('container6', {
        colors: ['#ff5500', '#D00000', '#800000', '#A00000', 'red'],
        chart: {
            type: 'column',
            height:550,
            width:500,
            inverted: true,
            polar: true
        },
        tooltip: {
            shared: true,
            useHTML: true,
        },
        legend: true,
        title: {
            text: 'Average Budget of All System Rating'
        },
        tooltip: {
            outside: true
        },
        pane: {
            size: '85%',
            innerSize: '20%',
            endAngle: 270
        },
        xAxis: {
            tickInterval: 1,
            legend: true,
            labels: {
                align: 'right',
                useHTML: true,
                allowOverlap: true,
                step: 1,
                y: 3,
                style: {
                    fontSize: '13px'
                }
            },
            lineWidth: 0,
            categories: [
                'R',
                'PG',
                'G',
                'PG-13',
                'NC-17'
            ]
        },
        yAxis: {
            tickPositions:[0,2500000,5000000,7500000,10000000,12500000,15000000,17500000,20000000,22500000,25000000,27500000,30000000],
            labels: {
            formatter: function() {
              return this.value / 1000000 + 'M';
            }},
            crosshair: {
                enabled: true,
                color: '#333'
            },
            lineWidth: 0,
            tickInterval: 25,
            reversedStacks: false,
            endOnTick: true,
            showLastLabel: true
        },
        plotOptions: {
            column: {
                stacking: 'normal',
                borderWidth: 0,
                pointPadding: 0,
                groupPadding: 0.15
            },
            
        },
        legend: {
                labelFormatter: function () {
                    if(this.data.length > 0) {
    			        return this.data[0].category;
                    } else {
                        return this.name;
                    }
    		    }
            },
        series: [{
            colorByPoint: true,
            name: 'Average Budget',
            data: [18335359, 21651074, 20182963, 25577399, 7479975]
        }]
    });
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Average Revenue' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. A 'Radial Bar Chart' is similar to a bar chart, but the y-axis is circular. This will be done using Javascript and HTML and will be saved in a .png file called 'average-revenue-of-all-sy.png'.

    In [340]:
    %%js
    Highcharts.chart('container7', {
        colors: ['#ff5500', '#D00000', '#800000', '#A00000', 'red'],
        chart: {
            type: 'column',
            height:550,
            width:500,
            inverted: true,
            polar: true
        },
        title: {
            text: 'Average Revenue of All System Rating'
        },
        tooltip: {
            outside: true
        },
        pane: {
            size: '85%',
            innerSize: '20%',
            endAngle: 270
        },
        xAxis: {
            tickInterval: 1,
            labels: {
                align: 'right',
                useHTML: true,
                allowOverlap: true,
                step: 1,
                y: 3,
                style: {
                    fontSize: '13px'
                }
            },
            lineWidth: 0,
            categories: [
                'R',
                'PG',
                'G',
                'PG-13',
                'NC-17'
            ]
        },
        yAxis: {
            tickPositions:[0,13000000,25000000,37500000,50000000,63500000,75000000,87500000,100000000,113500000,125000000, 137500000,150000000],
            labels: {
            formatter: function() {
              return this.value / 1000000 + 'M';
            }},
            crosshair: {
                enabled: true,
                color: '#333'
            },
            lineWidth: 0,
            tickInterval: 25,
            reversedStacks: false,
            endOnTick: true,
            showLastLabel: true
        },
        plotOptions: {
            column: {
                stacking: 'normal',
                borderWidth: 0,
                pointPadding: 0,
                groupPadding: 0.15
            }
        },
        legend: {
                labelFormatter: function () {
                    if(this.data.length > 0) {
    			        return this.data[0].category;
                    } else {
                        return this.name;
                    }
    		    }
            },
        series: [{
            colorByPoint: true,
            name: 'Average Revenue',
            data: [74913332,103158408,146873371,104420678,28266049,]
        }]
    });
    

    This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Average Ratings' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. A 'Polar Bar Chart' is similar to a bar chart, but the y-axis is circular. This will be saved in a .png file called 'average-rating-of-all-sy.png'.

    In [341]:
    %%js
    Highcharts.chart('container8', {
        colors: ['#ff5500', '#D00000', '#800000', '#A00000', 'red'],
        chart: {
            type: 'column',
            height:550,
            width:500,
            inverted: true,
            polar: true
        },
        title: {
            text: 'Average Rating of All System Rating'
        },
        tooltip: {
            outside: true
        },
        pane: {
            size: '85%',
            innerSize: '20%',
            endAngle: 270
        },
        xAxis: {
            tickInterval: 1,
            labels: {
                align: 'right',
                useHTML: true,
                allowOverlap: true,
                step: 1,
                y: 3,
                style: {
                    fontSize: '13px'
                }
            },
            lineWidth: 0,
            categories: [
                'R',
                'PG',
                'G',
                'PG-13',
                'NC-17'
            ]
        },
        yAxis: {
            tickPositions:[0,.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5],
            crosshair: {
                enabled: true,
                color: '#333'
            },
            lineWidth: 0,
            tickInterval: 25,
            reversedStacks: false,
            endOnTick: true,
            showLastLabel: true
        },
        plotOptions: {
            column: {
                stacking: 'normal',
                borderWidth: 0,
                pointPadding: 0,
                groupPadding: 0.15
            }
        },
        legend: {
                labelFormatter: function () {
                    if(this.data.length > 0) {
    			        return this.data[0].category;
                    } else {
                        return this.name;
                    }
    		    }
            },
        series: [{
            colorByPoint: true,
            name: 'Average Rating',
            data: [6.7,6.8,7.3,6.7,6.6]
        }]
    });
    

    Analysis

    The Average Budgets of
    all the R-rated Movies in the Drama genre.

    The Average Revenue of
    all the R-rated Movies in the Drama genre.

    The Average Rating of
    all the R-rated Movies in the Drama genre.

    Blueprint: Movies that made Profit¶

    This is the blueprint for creating the seventh visualization, Movies that made Profit. Highcharts will be used to create this graph.

    Blueprint:

    The graph used for this visualzation is Highcharts Lollipop series which is found in the Highcharts Demos. Lollipop charts are variants of column charts, with a circle marker for the data value and a line extending to the axis.The first approach to this chart is by understanding the format of the script. This graph has two types of code HTML and Javascript. JupyterLab has magic commands that supports HTML and Javascript, it uses the '%' syntax element for magics.

    Before the HTML and Javascript are scripted a dataframe needs to be made to extract all the movies that made profit in the action and adventure genre. The two variable needed from the parent dataframe '_all_drama_info1_' is the name and profit made in integer.

    • HTML Section: To start off the magic command '%%HTML' should be scripted first in the cell. The main scripted lines that will be used to create highchart graph through out this project are;
      <script>src="https://code.highcharts.com/highcharts.js"></script>
      <script> src="https://code.highcharts.com/highcharts-more.js"></script>
      <script> src="https://code.highcharts.com/modules/exporting.js"></script>
      <script>src="https://code.highcharts.com/modules/accessibility.js"></script>
      For this particular graph this script will be used to get lollipop affect;
      <script> src="https://code.highcharts.com/modules/dumbbell.js"></script>
      <script> src="https://code.highcharts.com/modules/lollipop.js"></script>
      To close off the HTML script, this script is used and to indetify and differenate this claa from other, the div id has to be named, and if thei graph is being used more than once the div id needs to have a different name at all times or it will not work;
      <figure class="highcharts-figure">
      <div id="r1"></div>
      <p class= "highcharts-description">
      </p>
      </figure>
    • Javascript Section: To start off, the magic command '%%JS' is used to script javascript in the jupyterlab cell. The javascript section will have comments and explaination, but for futher information go to Highcharts Demos.The normal layout is horizontal but for this graph its going to be vertical. The main javascript used to create this lollipop graph is;
      name: ' The Sound of Music '
      low: 2303109231

    This is the 'Drama_DataFrame' dataframe.

    In [353]:
    Drama_DataFrame
    
    Out[353]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 180047784 $180,047,784 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
    1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 142634358 $142,634,358 -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
    2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 693698673 $693,698,673 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
    3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 449948323 $449,948,323 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
    4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 634454789 $634,454,789 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    301 A Dirty Shame September 24, 2004 Drama NC-17 15000000.0 $15,000,000 1339668 $1,339,668 574498.0 $574,498 1914166 $1,914,166 -13085834.0 $-13,085,834 191417 191,417 84.0 5.1 Killer Films Suzanne Shepherd John Waters John Waters
    302 Young Adam April 16, 2004 Drama NC-17 6400000.0 $6,400,000 767373 $767,373 1794447.0 $1,794,447 2561820 $2,561,820 -3838180.0 $-3,838,180 256182 256,182 98.0 6.4 Recorded Picture Company Tilda Swinton David Mackenzie \tDavid Mackenzie
    303 Whore 1991 October 4, 1991 Drama NC-17 50000.0 $50,000 0 $0 0.0 $0 1008404 $1,008,404 958404.0 $958,404 100840 100,840 80.0 5.5 Cheap Date Theresa Russell Ken Russell Deborah Dalton
    304 Ma Mère May 13, 2005 Drama NC-17 3259572.0 $3,259,572 71616 $71,616 950532.0 $950,532 1022148 $1,022,148 -2237424.0 $-2,237,424 102215 102,215 110.0 5.0 Gemini Films Louis Garrel Christophe Honoré Christophe Honoré
    305 Law of Desire April 3, 1987 Drama NC-17 612072.0 $612,072 0 $0 0.0 $0 1470809 $1,470,809 858737.0 $858,737 147081 147,081 82.0 7.1 El Deseo Antonio Banderas Pedro Almodóvar Pedro Almodóvar

    306 rows × 22 columns

    Getting the 'Profit' of each 'R-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [406]:
    sum_r = []
    for i in profit_int:
        if i < 0: continue
        else: sum_r.append(i)
    print(sum_r)
    
    [349948323, 307567189, 24154026, 326398492, 316350619, 19966854, 82112435, 530998101, 13147416, 129558438, 54735925, 9898681, 8554727, 17017873, 26604054, 8270399, 318266710, 25358392, 23262783, 7859167, 23830713, 34913, 31043521, 45178935, 60133905, 12417298, 69233867, 3765283, 12499242, 12636004, 222016, 53273049, 36954520, 17033227, 35669037, 20251930, 14610760, 14131551, 9295324, 8153415, 88390, 4328516, 19282640, 12744931, 15566240, 4438911, 156309, 294448, 2669782, 48766923, 68711836, 14718173, 1851683, 556082, 1500000, 2000000]
    

    Getting the 'Toal Profit' of each 'R-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of R-rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below

    In [426]:
    var_r = []
    for i in profit_int: var_r.append(sum(sum_r))
    print(var_r)
    
    [3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978]
    

    Getting the 'Profit' of each 'NC-17 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [408]:
    sum_nc17 = []
    for i in profit_int4:
        if i < 0: continue
        else: sum_nc17.append(i)
    print(sum_nc17)
    
    [13912841, 4856268, 8404, 257845, 659312, 18912216, 89410061, 121165, 52091915, 13912841, 15465835, 307113, 13912841, 15390895, 15566240, 1315026, 256669, 201120004, 50167430, 2311944, 13912841, 2548651, 16283563, 3664240, 1038916, 8000000, 18912216, 94673038, 34897711, 401802, 50167430, 3546453, 958404, 858737]
    

    Getting the 'Toal Profit' of each 'NC-17 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of NC-17 rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below

    In [427]:
    var_nc17 = []
    for i in profit_int4: var_nc17.append(sum(sum_nc17))
    print(var_nc17)
    
    [759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867]
    

    Getting the 'Profit' of each 'PG-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [411]:
    sum_pg = []
    for i in profit_int1:
        if i < 0: continue
        else: sum_pg.append(i)
    print(sum_pg)
    
    [47784, 59068724, 284604712, 72678948, 70975239, 10531500, 4609597, 36918287, 447351353, 70986904, 285937718, 176601214, 33102988, 26696000, 35694916, 4344615, 6741732, 74830111, 10948425, 120587063, 34605762, 32973297, 69137047, 62667874, 83269971, 120036382, 81120329, 3835130, 118582776, 3101815, 48954968, 5164458, 107956187, 31440294, 12815212, 150297525, 21856053, 104285432, 28716963, 7423752, 108052686, 544368315, 42892670, 3943124, 71808942, 20000000]
    

    Getting the 'Toal Profit' of each 'PG-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of PG-rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below

    In [428]:
    var_pg = []
    for i in profit_int1: var_pg.append(sum(sum_pg))
    print(var_pg)
    
    [3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794]
    

    Getting the 'Profit' of each 'PG-13 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [414]:
    sum_pg13 = []
    for i in profit_int3:
        if i < 0: continue
        else: sum_pg13.append(i)
    print(sum_pg13)
    
    [583698673, 559454789, 77551594, 35552675, 163591522, 129748880, 58660270, 22004627, 156127894, 4478084, 122498338, 129590606, 78809717, 136567581, 60143987, 49309093, 217276928, 26721826, 29802928, 132552290, 167618160, 38984536, 66050951, 15059418, 188120004, 117033509, 71633833, 41540205, 4847480, 57917283, 40282881, 188265198, 2281732, 57086711, 317522294, 21028230, 36545707, 40506120, 113955898, 5601987, 44168692, 20044909, 20069303, 20909437, 11477345, 67356170, 51076141, 51603136, 21556959, 27087044, 72831866, 12971021, 23787727, 29964656, 10369708, 143806510, 36699612, 13945682, 1205034, 12698355, 33185884, 4152584, 3478400, 1927779]
    

    Getting the 'Toal Profit' of each 'PG-13 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of PG-13 rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below

    In [429]:
    var_pg13 = []
    for i in profit_int3: var_pg13.append(sum(sum_pg13))
    print(var_pg13)
    
    [5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393]
    

    Getting the 'Profit' of each 'G-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.

    In [417]:
    sum_g = []
    for i in profit_int2:
        if i < 0: continue
        else: sum_g.append(i)
    print(sum_g)
    
    [1711143, 11587135, 58693537, 418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 1250000, 3851000, 58985708, 7657973, 58491516, 293281000, 278014195, 30482317, 941214868, 267142000, 55071636, 37707417, 23794409, 52500000, 5850377, 10300000]
    

    Getting the 'Toal Profit' of each 'G-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of G-rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below

    In [430]:
    var_g = []
    for i in profit_int2: var_g.append(sum(sum_g))
    print(var_g)
    
    [3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288]
    

    Using a for loop to put the name and profit of all the R-rated movies in html code which will be copied and pasted in the cell below.

    In [419]:
    for x in range(len(name)):
        print("[ '",name[x],"'",',', profit_int[x],'],')
    
    [ ' Django Unchained ' , 349948323 ],
    [ ' Gone Girl ' , 307567189 ],
    [ ' Priest ' , 24154026 ],
    [ ' Fifty Shades Darker ' , 326398492 ],
    [ ' Fifty Shades Freed ' , 316350619 ],
    [ ' Crimson Peak ' , 19966854 ],
    [ ' Zero Dark Thirty ' , 82112435 ],
    [ ' Fifty Shades of Grey ' , 530998101 ],
    [ ' The Master ' , 13147416 ],
    [ ' Flight ' , 129558438 ],
    [ ' The Ides of March ' , 54735925 ],
    [ ' Nocturnal Animals ' , 9898681 ],
    [ ' The Water Diviner ' , 8554727 ],
    [ ' For Colored Girls ' , 17017873 ],
    [ ' The Debt ' , 26604054 ],
    [ ' Let Me In ' , 8270399 ],
    [ ' Black Swan ' , 318266710 ],
    [ ' Ex Machina ' , 25358392 ],
    [ ' Room ' , 23262783 ],
    [ ' If Beale Street Could Talk ' , 7859167 ],
    [ ' Arbitrage ' , 23830713 ],
    [ ' Stoker ' , 34913 ],
    [ ' Carol ' , 31043521 ],
    [ ' Quartet ' , 45178935 ],
    [ ' Hereditary ' , 60133905 ],
    [ ' Melancholia ' , 12417298 ],
    [ ' Manchester by the Sea ' , 69233867 ],
    [ ' We Need to Talk About Kevin ' , 3765283 ],
    [ ' Addicted ' , 12499242 ],
    [ ' Mommy ' , 12636004 ],
    [ ' Take Shelter ' , 222016 ],
    [ ' Boyhood ' , 53273049 ],
    [ ' The Witch ' , 36954520 ],
    [ ' Margin Call ' , 17033227 ],
    [ ' Whiplash ' , 35669037 ],
    [ ' Before Midnight ' , 20251930 ],
    [ ' Silent House ' , 14610760 ],
    [ ' Winter's Bone ' , 14131551 ],
    [ ' The Florida Project ' , 9295324 ],
    [ ' We Are Your Friends ' , 8153415 ],
    [ ' Locke ' , 88390 ],
    [ ' Knock Knock ' , 4328516 ],
    [ ' Buried ' , 19282640 ],
    [ ' Unsane ' , 12744931 ],
    [ ' Blue Valentine ' , 15566240 ],
    [ ' Martha Marcy May Marlene ' , 4438911 ],
    [ ' Palo Alto ' , 156309 ],
    [ ' Sound of My Voice ' , 294448 ],
    [ ' A Ghost Story ' , 2669782 ],
    [ ' Ordinary People ' , 48766923 ],
    [ ' Fame ' , 68711836 ],
    [ ' Endless Love ' , 14718173 ],
    [ ' Ghost Story ' , 1851683 ],
    [ ' Zoot Suit ' , 556082 ],
    [ ' Rich and Famous ' , 1500000 ],
    [ ' Raggedy Man ' , 2000000 ],
    

    Using a for loop to put the name and profit of all the NC-17 rated movies in html code which will be copied and pasted in the cell below.

    In [420]:
    for x in range(len(name4)):
        print("[ '",name4[x],"'",',', profit_int4[x],'],')
    
    [ ' Shame ' , 13912841 ],
    [ ' Matador ' , 4856268 ],
    [ ' Whore ' , 8404 ],
    [ ' Tokyo Decadence ' , 257845 ],
    [ ' Wide Sargasso Sea ' , 659312 ],
    [ ' Kids ' , 18912216 ],
    [ ' Crash ' , 89410061 ],
    [ ' The Dreamers ' , 121165 ],
    [ ' Lust, Caution ' , 52091915 ],
    [ ' Shame ' , 13912841 ],
    [ ' Blue Is the Warmest Colour ' , 15465835 ],
    [ ' The Dreamers ' , 307113 ],
    [ ' Shame ' , 13912841 ],
    [ ' Blue Is the Warmest Colour ' , 15390895 ],
    [ ' Blue Valentine ' , 15566240 ],
    [ ' Two Girls and a Guy ' , 1315026 ],
    [ ' Elles ' , 256669 ],
    [ ' Hell ' , 201120004 ],
    [ ' Se, jie ' , 50167430 ],
    [ ' The Evil Dead ' , 2311944 ],
    [ ' Shame ' , 13912841 ],
    [ ' Arabian Nights ' , 2548651 ],
    [ ' Natural Born Killers ' , 16283563 ],
    [ ' Clerks ' , 3664240 ],
    [ ' Bad Lieutenant ' , 1038916 ],
    [ ' Beyond the Valley of the Dolls ' , 8000000 ],
    [ ' Kids ' , 18912216 ],
    [ ' Crash ' , 94673038 ],
    [ ' Last Tango in Paris ' , 34897711 ],
    [ ' Pink Flamingos ' , 401802 ],
    [ ' Lust, Caution  ' , 50167430 ],
    [ ' Happiness 1998 ' , 3546453 ],
    [ ' Whore 1991 ' , 958404 ],
    [ ' Law of Desire ' , 858737 ],
    

    Using a for loop to put the name and profit of all the PG-rated movies in html code which will be copied and pasted in the cell below.

    In [421]:
    for x in range(len(name1)):
        print("[ '",name1[x],"'",',', profit_int1[x],'],')
    
    [ ' Hugo ' , 47784 ],
    [ ' Dolphin Tale ' , 59068724 ],
    [ ' Wonder ' , 284604712 ],
    [ ' The Last Song ' , 72678948 ],
    [ ' War Room ' , 70975239 ],
    [ ' The Lunchbox ' , 10531500 ],
    [ ' Somewhere in Time ' , 4609597 ],
    [ ' Urban Cowboy ' , 36918287 ],
    [ ' Cinderella ' , 447351353 ],
    [ ' War Room ' , 70986904 ],
    [ ' Wonder ' , 285937718 ],
    [ ' Little Women ' , 176601214 ],
    [ ' Overcomer ' , 33102988 ],
    [ ' The Jazz Singer ' , 26696000 ],
    [ ' A Walk to Remember ' , 35694916 ],
    [ ' Tuck Everlasting ' , 4344615 ],
    [ ' Dreamer ' , 6741732 ],
    [ ' The Lake House ' , 74830111 ],
    [ ' Akeelah and the Bee ' , 10948425 ],
    [ ' Bridge to Terabithia ' , 120587063 ],
    [ ' August Rush ' , 34605762 ],
    [ ' Fireproof ' , 32973297 ],
    [ ' The Last Song ' , 69137047 ],
    [ ' God's Not Dead ' , 62667874 ],
    [ ' Mr. Holland's Opus ' , 83269971 ],
    [ ' Phenomenon ' , 120036382 ],
    [ ' Contact ' , 81120329 ],
    [ ' The Spanish Prisoner ' , 3835130 ],
    [ ' Sense and Sensibility ' , 118582776 ],
    [ ' The Secret of Roan Inish ' , 3101815 ],
    [ ' The Remains of the Day ' , 48954968 ],
    [ ' Pure Country ' , 5164458 ],
    [ ' Forever Young ' , 107956187 ],
    [ ' A River Runs Through It ' , 31440294 ],
    [ ' Honeysuckle Rose ' , 12815212 ],
    [ ' Resurrection ' , 150297525 ],
    [ ' Taps ' , 21856053 ],
    [ ' On Golden Pond ' , 104285432 ],
    [ ' Absence of Malice ' , 28716963 ],
    [ ' The Night the Lights Went Out in Georgia ' , 7423752 ],
    [ ' Rocky III ' , 108052686 ],
    [ ' Tex ' , 544368315 ],
    [ ' Staying Alive ' , 42892670 ],
    [ ' Tender Mercies ' , 3943124 ],
    [ ' Footloose ' , 71808942 ],
    [ ' The Natural ' , 20000000 ],
    

    Using a for loop to put the name and profit of all the PG-13 rated movies in html code which will be copied and pasted in the cell below.

    In [422]:
    for x in range(len(name3)):
        print("[ '",name3[x],"'",',', profit_int3[x],'],')
    
    [ ' Gravity ' , 583698673 ],
    [ ' Sing ' , 559454789 ],
    [ ' Contagion ' , 77551594 ],
    [ ' Burlesque ' , 35552675 ],
    [ ' Creed II ' , 163591522 ],
    [ ' The Post ' , 129748880 ],
    [ ' Hereafter ' , 58660270 ],
    [ ' Anna Karenina ' , 22004627 ],
    [ ' Arrival ' , 156127894 ],
    [ ' Charlie St. Cloud ' , 4478084 ],
    [ ' Bridge of Spies ' , 122498338 ],
    [ ' The Impossible ' , 129590606 ],
    [ ' Water for Elephants ' , 78809717 ],
    [ ' Creed ' , 136567581 ],
    [ ' The Rite ' , 60143987 ],
    [ ' Collateral Beauty ' , 49309093 ],
    [ ' True Grit ' , 217276928 ],
    [ ' The Tree of Life ' , 26721826 ],
    [ ' The Longest Ride ' , 29802928 ],
    [ ' Step Up Revolution ' , 132552290 ],
    [ ' The Vow ' , 167618160 ],
    [ ' The Age of Adaline ' , 38984536 ],
    [ ' Safe Haven ' , 66050951 ],
    [ ' The Best of Me ' , 15059418 ],
    [ ' The Help ' , 188120004 ],
    [ ' Dear John ' , 117033509 ],
    [ ' The Lucky One ' , 71633833 ],
    [ ' The Giver ' , 41540205 ],
    [ ' Draft Day ' , 4847480 ],
    [ ' Rings ' , 57917283 ],
    [ ' Fences ' , 40282881 ],
    [ ' Me Before You ' , 188265198 ],
    [ ' The Light Between Oceans ' , 2281732 ],
    [ ' The Book Thief ' , 57086711 ],
    [ ' A Quiet Place ' , 317522294 ],
    [ ' Beastly ' , 21028230 ],
    [ ' The Roommate ' , 36545707 ],
    [ ' Remember Me ' , 40506120 ],
    [ ' The Woman in Black ' , 113955898 ],
    [ ' Country Strong ' , 5601987 ],
    [ ' One Day ' , 44168692 ],
    [ ' Suffragette ' , 20044909 ],
    [ ' The Perks of Being a Wallflower ' , 20069303 ],
    [ ' Project Almanac ' , 20909437 ],
    [ ' Wish Upon ' , 11477345 ],
    [ ' If I Stay ' , 67356170 ],
    [ ' Brooklyn ' , 51076141 ],
    [ ' Everything, Everything ' , 51603136 ],
    [ ' Mud ' , 21556959 ],
    [ ' Amour ' , 27087044 ],
    [ ' Ouija: Origin of Evil ' , 72831866 ],
    [ ' Black or White ' , 12971021 ],
    [ ' The Bye Bye Man ' , 23787727 ],
    [ ' Gifted ' , 29964656 ],
    [ ' The Words ' , 10369708 ],
    [ ' Lights Out ' , 143806510 ],
    [ ' Still Alice ' , 36699612 ],
    [ ' Before I Fall ' , 13945682 ],
    [ ' Rabbit Hole ' , 1205034 ],
    [ ' Ida ' , 12698355 ],
    [ ' Courageous ' , 33185884 ],
    [ ' Mustang ' , 4152584 ],
    [ ' Like Crazy ' , 3478400 ],
    [ ' Another Earth ' , 1927779 ],
    

    Using a for loop to put the name and profit of all the G-rated movies in html code which will be copied and pasted in the cell below.

    In [423]:
    for x in range(len(name2)):
        print("[ '",name2[x],"'",',', profit_int2[x],'],')
    
    [ ' A Sunday in the Country ' , 1711143 ],
    [ ' Prancer ' , 11587135 ],
    [ ' The Rookie ' , 58693537 ],
    [ ' Beauty and the Beast 1991 ' , 418656843 ],
    [ ' The Little Rascals ' , 43947950 ],
    [ ' Ramona and Beezus ' , 12469621 ],
    [ ' The Black Stallion ' , 35099643 ],
    [ ' The Hunchback of Notre Drame ' , 255500000 ],
    [ ' Babe ' , 216100000 ],
    [ ' Pollyanna ' , 1250000 ],
    [ ' Lassie Come Home ' , 3851000 ],
    [ ' Charlotte's Web ' , 58985708 ],
    [ ' Kit Kittredge: An American Girl ' , 7657973 ],
    [ ' The Rookie ' , 58491516 ],
    [ ' The Secret Garden ' , 293281000 ],
    [ ' The Sound of Music ' , 278014195 ],
    [ ' The Tale of Despereaux ' , 30482317 ],
    [ ' The Lion King 1994 ' , 941214868 ],
    [ ' Bambi 1942 ' , 267142000 ],
    [ ' My Fair Lady 1964 ' , 55071636 ],
    [ ' Hachiko: A Dog's Story ' , 37707417 ],
    [ ' Giant ' , 23794409 ],
    [ ' The Ten Commandments 1966 ' , 52500000 ],
    [ ' The Quiet Man ' , 5850377 ],
    [ ' Three Cions in the Fountain ' , 10300000 ],
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the of the Total Profit of each System Rating of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column and Line series'. This will be done using Javascript and HTML below and will be saved in an .png file.

    In [18]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    <script src="https://code.highcharts.com/themes/sunset.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/column.js"></script>
    
    <figure class="highcharts-figure">
        <div id="-"></div>
        <p class="highcharts-description">
        </p>
    </figure>
    

    In [20]:
    %%js
    function dollarFormat(x) {
        return '$' + Highcharts.numberFormat(x, 0, '.', ',');
    }
    
    var colors = Highcharts.getOptions().colors;
    
    Highcharts.chart('-', {
        chart: {
            type: 'column',
            inverted: false,
            height: 450,
            width: 1100,
            
        },
    
        accessibility: {
            series: {
                descriptionFormatter: function (series) {
                    return series.type === 'line' ?
                        series.name + ', ' + dollarFormat(series.points[0].y) :
                        series.name + ' grant amounts, bar series with ' +
                        series.points.length + ' bars.';
                }
            },
            point: {
                valuePrefix: '$'
            },
            keyboardNavigation: {
                seriesNavigation: {
                    mode: 'serialize'
                }
            }
        },
    
        title: {
            text: 'Total Net Profit of each System Rating in the Drama Genere',
            margin: 35
        },
    
        subtitle: {
            text: 'There are five System Ratings: R-rated| G-rated| PG-rated| PG-13 rated| NC-17 rated '
        },
    
        xAxis: {
            visible: false,
            accessibility: {
                description: 'Grant applicants',
                rangeDescription: ''
            }
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
    
        yAxis: [{
            min: 0,
            max: 900000000,
            step: 250000000,
            labels: {
                format: '${text}'
            },
            title: {
                text: 'Movies Profit'
            },
            gridLineWidth: 1
        }, {
            accessibility: {
                description: 'System Ratigs Category Totals'
            },
            opposite: true,
            min: 0,
            max: 7000000000,
            step: 1000000000,
            gridLineWidth: 0,
            labels: {
                format: '${text}',
                style: {
                    color: '#8F6666'
                }
            },
            title: {
                text: 'System Ratigs Category Total',
                style: {
                    color: '#8F6666'
                }
            }
        }],
    
        credits: {
            enabled: false
        },
    
        plotOptions: {
            column: {
                keys: ['name', 'y'],
                grouping: false,
                pointPadding: 0.1,
                groupPadding: 0,
                tooltip: {
                    headerFormat: '<span style="font-size: 10px">' +
                        '<span style="color:{point.color}">\u25CF</span> ' +
                        '{series.name}</span><br/>',
                    pointFormat: '{point.name}: <b>${point.y:,.0f}</b><br/>'
                }
            },
            line: {
                yAxis: 1,
                lineWidth: 5,
                accessibility: {
                    exposeAsGroupOnly: true
                },
                marker: {
                    enabled: false
                },
                enableMouseTracking: false,
                linkedTo: ':previous',
                dataLabels: {
                    enabled: true,
                    verticalAlign: 'bottom',
                    style: {
                        color: '#757575',
                        fontWeight: 'normal'
                    },
                    formatter: function () {
                        if (this.point === this.series.points[Math.floor(
                            this.series.points.length / 2
                        )]) {
                            return 'Total: $' + Highcharts.numberFormat(this.y, 0);
                        }
                    }
                }
            }
        },
    
        responsive: {
            rules: [{
                condition: {
                    maxWidth: 400
                },
                chartOptions: {
                    chart: {
                        spacingLeft: 3,
                        spacingRight: 5
                    },
                    yAxis: [{}, {
                        visible: false
                    }]
                }
            }]
        },
    
        series: [{
            name: 'System Rating R',
            color: '#ff0000',
            borderColor: '#A59273',
            borderWidth: 1,
            data: [
                [ ' Django Unchained ' , 349948323 ],
                [ ' Gone Girl ' , 307567189 ],
                [ ' Priest ' , 24154026 ],
                [ ' Fifty Shades Darker ' , 326398492 ],
                [ ' Fifty Shades Freed ' , 316350619 ],
                [ ' Crimson Peak ' , 19966854 ],
                [ ' Zero Dark Thirty ' , 82112435 ],
                [ ' The Master ' , 13147416 ],
                [ ' Flight ' , 129558438 ],
                [ ' The Ides of March ' , 54735925 ],
                [ ' Nocturnal Animals ' , 9898681 ],
                [ ' The Water Diviner ' , 8554727 ],
                [ ' For Colored Girls ' , 17017873 ],
                [ ' The Debt ' , 26604054 ],
                [ ' Let Me In ' , 8270399 ],
                [ ' Black Swan ' , 318266710 ],
                [ ' Ex Machina ' , 25358392 ],
                [ ' Room ' , 23262783 ],
                [ ' If Beale Street Could Talk ' , 7859167 ],
                [ ' Arbitrage ' , 23830713 ],
                [ ' Stoker ' , 34913 ],
                [ ' Carol ' , 31043521 ],
                [ ' Quartet ' , 45178935 ],
                [ ' Hereditary ' , 60133905 ],
                [ ' Melancholia ' , 12417298 ],
                [ ' Manchester by the Sea ' , 69233867 ],
                [ ' We Need to Talk About Kevin ' , 3765283 ],
                [ ' Addicted ' , 12499242 ],
                [ ' Mommy ' , 12636004 ],
                [ ' Take Shelter ' , 222016 ],
                [ ' Boyhood ' , 53273049 ],
                [ ' The Witch ' , 36954520 ],
                [ ' Margin Call ' , 17033227 ],
                [ ' Whiplash ' , 35669037 ],
                [ ' Before Midnight ' , 20251930 ],
                [ ' Silent House ' , 14610760 ],
                [ ' Winter\'s Bone ' , 14131551 ],
                [ ' The Florida Project ' , 9295324 ],
                [ ' We Are Your Friends ' , 8153415 ],
                [ ' Locke ' , 88390 ],
                [ ' Knock Knock ' , 4328516 ],
                [ ' Buried ' , 19282640 ],
                [ ' Unsane ' , 12744931 ],
                [ ' Blue Valentine ' , 15566240 ],
                [ ' Martha Marcy May Marlene ' , 4438911 ],
                [ ' Palo Alto ' , 156309 ],
                [ ' Sound of My Voice ' , 294448 ],
                [ ' A Ghost Story ' , 2669782 ],
                [ ' Ordinary People ' , 48766923 ],
                [ ' Fame ' , 68711836 ],
                [ ' Endless Love ' , 14718173 ],
                [ ' Ghost Story ' , 1851683 ],
                [ ' Zoot Suit ' , 556082 ],
                [ ' Rich and Famous ' , 1500000 ],
                [ ' Raggedy Man ' , 2000000 ],
               ]
        }, {
            type: 'line',
            name: 'System Rating R',
            data: [
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978
                 
            ],
            color: '#ff1919'
        }, {
            name: 'System Rating NC-17',
            color: '#d61111',
            data: [
                [ ' Shame ' , 13912841 ],
                [ ' Matador ' , 4856268 ],
                [ ' Whore ' , 8404 ],
                [ ' Tokyo Decadence ' , 257845 ],
                [ ' Wide Sargasso Sea ' , 659312 ],
                [ ' Kids ' , 18912216 ],
                [ ' Crash ' , 89410061 ],
                [ ' The Dreamers ' , 121165 ],
                [ ' Lust, Caution ' , 52091915 ],
                [ ' Shame ' , 13912841 ],
                [ ' Blue Is the Warmest Colour ' , 15465835 ],
                [ ' The Dreamers ' , 307113 ],
                [ ' Shame ' , 13912841 ],
                [ ' Blue Is the Warmest Colour ' , 15390895 ],
                [ ' Blue Valentine ' , 15566240 ],
                [ ' Two Girls and a Guy ' , 1315026 ],
                [ ' Elles ' , 256669 ],
                [ ' Se, jie ' , 50167430 ],
                [ ' The Evil Dead ' , 2311944 ],
                [ ' Shame ' , 13912841 ],
                [ ' Arabian Nights ' , 2548651 ],
                [ ' Natural Born Killers ' , 16283563 ],
                [ ' Clerks ' , 3664240 ],
                [ ' Bad Lieutenant ' , 1038916 ],
                [ ' Beyond the Valley of the Dolls ' , 8000000 ],
                [ ' Kids ' , 18912216 ],
                [ ' Crash ' , 94673038 ],
                [ ' Last Tango in Paris ' , 34897711 ],
                [ ' Pink Flamingos ' , 401802 ],
                [ ' Lust, Caution  ' , 50167430 ],
                [ ' Happiness 1998 ' , 3546453 ],
                [ ' Whore 1991 ' , 958404 ],
                [ ' Law of Desire ' , 858737 ],
            ],
            pointStart: 59
        }, {
            type: 'line',
            name: 'System Rating NC-17',
            data: [
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867,  
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867
            ],
            pointStart: 59,
            color: '#d61111'
        }, {
            name: 'System Rating PG',
            color: '#a10505',
            data: [
                [ ' Hugo ' , 47784 ],
                [ ' Dolphin Tale ' , 59068724 ],
                [ ' Wonder ' , 284604712 ],
                [ ' The Last Song ' , 72678948 ],
                [ ' War Room ' , 70975239 ],
                [ ' The Lunchbox ' , 10531500 ],
                [ ' Somewhere in Time ' , 4609597 ],
                [ ' Urban Cowboy ' , 36918287 ],
                [ ' Cinderella ' , 447351353 ],
                [ ' War Room ' , 70986904 ],
                [ ' Wonder ' , 285937718 ],
                [ ' Little Women ' , 176601214 ],
                [ ' Overcomer ' , 33102988 ],
                [ ' The Jazz Singer ' , 26696000 ],
                [ ' A Walk to Remember ' , 35694916 ],
                [ ' Tuck Everlasting ' , 4344615 ],
                [ ' Dreamer ' , 6741732 ],
                [ ' The Lake House ' , 74830111 ],
                [ ' Akeelah and the Bee ' , 10948425 ],
                [ ' Bridge to Terabithia ' , 120587063 ],
                [ ' August Rush ' , 34605762 ],
                [ ' Fireproof ' , 32973297 ],
                [ ' The Last Song ' , 69137047 ],
                [ ' God\'s Not Dead ' , 62667874 ],
                [ ' Mr. Holland\'s Opus ' , 83269971 ],
                [ ' Phenomenon ' , 120036382 ],
                [ ' Contact ' , 81120329 ],
                [ ' The Spanish Prisoner ' , 3835130 ],
                [ ' Sense and Sensibility ' , 118582776 ],
                [ ' The Secret of Roan Inish ' , 3101815 ],
                [ ' The Remains of the Day ' , 48954968 ],
                [ ' Pure Country ' , 5164458 ],
                [ ' Forever Young ' , 107956187 ],
                [ ' A River Runs Through It ' , 31440294 ],
                [ ' Honeysuckle Rose ' , 12815212 ],
                [ ' Resurrection ' , 150297525 ],
                [ ' Taps ' , 21856053 ],
                [ ' On Golden Pond ' , 104285432 ],
                [ ' Absence of Malice ' , 28716963 ],
                [ ' The Night the Lights Went Out in Georgia ' , 7423752 ],
                [ ' Rocky III ' , 108052686 ],
                [ ' Tex ' , 544368315 ],
                [ ' Staying Alive ' , 42892670 ],
                [ ' Tender Mercies ' , 3943124 ],
                [ ' Footloose ' , 71808942 ],
                [ ' The Natural ' , 20000000 ],
    
    
            ],
            pointStart: 96
        }, {
            type: 'line',
            name: 'System Rating PG',
            data: [
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,  
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,  
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794,
            ],
            pointStart: 96,
            color: '#a10505'
        }, {
            name: 'System Rating PG\-13',
            color: '#7a2f2f',
            data: [
                [ ' Gravity ' , 583698673 ],
                [ ' Sing ' , 559454789 ],
                [ ' Contagion ' , 77551594 ],
                [ ' Burlesque ' , 35552675 ],
                [ ' Creed II ' , 163591522 ],
                [ ' The Post ' , 129748880 ],
                [ ' Hereafter ' , 58660270 ],
                [ ' Anna Karenina ' , 22004627 ],
                [ ' Arrival ' , 156127894 ],
                [ ' Charlie St. Cloud ' , 4478084 ],
                [ ' Bridge of Spies ' , 122498338 ],
                [ ' The Impossible ' , 129590606 ],
                [ ' Water for Elephants ' , 78809717 ],
                [ ' Creed ' , 136567581 ],
                [ ' The Rite ' , 60143987 ],
                [ ' Collateral Beauty ' , 49309093 ],
                [ ' True Grit ' , 217276928 ],
                [ ' The Tree of Life ' , 26721826 ],
                [ ' The Longest Ride ' , 29802928 ],
                [ ' Step Up Revolution ' , 132552290 ],
                [ ' The Vow ' , 167618160 ],
                [ ' The Age of Adaline ' , 38984536 ],
                [ ' Safe Haven ' , 66050951 ],
                [ ' The Best of Me ' , 15059418 ],
                [ ' The Help ' , 188120004 ],
                [ ' Dear John ' , 117033509 ],
                [ ' The Lucky One ' , 71633833 ],
                [ ' The Giver ' , 41540205 ],
                [ ' Draft Day ' , 4847480 ],
                [ ' Rings ' , 57917283 ],
                [ ' Fences ' , 40282881 ],
                [ ' Me Before You ' , 188265198 ],
                [ ' The Light Between Oceans ' , 2281732 ],
                [ ' The Book Thief ' , 57086711 ],
                [ ' A Quiet Place ' , 317522294 ],
                [ ' Beastly ' , 21028230 ],
                [ ' The Roommate ' , 36545707 ],
                [ ' Remember Me ' , 40506120 ],
                [ ' The Woman in Black ' , 113955898 ],
                [ ' Country Strong ' , 5601987 ],
                [ ' One Day ' , 44168692 ],
                [ ' Suffragette ' , 20044909 ],
                [ ' The Perks of Being a Wallflower ' , 20069303 ],
                [ ' Project Almanac ' , 20909437 ],
                [ ' Wish Upon ' , 11477345 ],
                [ ' If I Stay ' , 67356170 ],
                [ ' Brooklyn ' , 51076141 ],
                [ ' Everything, Everything ' , 51603136 ],
                [ ' Mud ' , 21556959 ],
                [ ' Amour ' , 27087044 ],
                [ ' Ouija: Origin of Evil ' , 72831866 ],
                [ ' Black or White ' , 12971021 ],
                [ ' The Bye Bye Man ' , 23787727 ],
                [ ' Gifted ' , 29964656 ],
                [ ' The Words ' , 10369708 ],
                [ ' Lights Out ' , 143806510 ],
                [ ' Still Alice ' , 36699612 ],
                [ ' Before I Fall ' , 13945682 ],
                [ ' Rabbit Hole ' , 1205034 ],
                [ ' Ida ' , 12698355 ],
                [ ' Courageous ' , 33185884 ],
                [ ' Mustang ' , 4152584 ],
                [ ' Like Crazy ' , 3478400 ],
                [ ' Another Earth ' , 1927779 ]
            ],
            pointStart: 150
        }, {
            type: 'line',
            name: 'System Rating PG\-13',
            data: [
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,  
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,  
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 
                5102398393, 5102398393, 5102398393, 5102398393,
            ],
            pointStart: 150,
            color: '#7a2f2f',
        },{
            name: 'System Rating G',
            color: '#4d0909',
            borderWidth: 1,
            data: [
               
                [ ' A Sunday in the Country ' , 1711143 ],
                [ ' Prancer ' , 11587135 ],
                [ ' The Rookie ' , 58693537 ],
                [ ' Beauty and the Beast 1991 ' , 418656843 ],
                [ ' The Little Rascals ' , 43947950 ],
                [ ' Ramona and Beezus ' , 12469621 ],
                [ ' The Black Stallion ' , 35099643 ],
                [ ' The Hunchback of Notre Drame ' , 255500000 ],
                [ ' Babe ' , 216100000 ],
                [ ' Pollyanna ' , 1250000 ],
                [ ' Lassie Come Home ' , 3851000 ],
                [ ' Charlotte\'s Web ' , 58985708 ],
                [ ' Kit Kittredge: An American Girl ' , 7657973 ],
                [ ' The Rookie ' , 58491516 ],
                [ ' The Secret Garden ' , 293281000 ],
                [ ' The Sound of Music ' , 278014195 ],
                [ ' The Tale of Despereaux ' , 30482317 ],
                [ ' Bambi 1942 ' , 267142000 ],
                [ ' My Fair Lady 1964 ' , 55071636 ],
                [ ' Hachiko: A Dog\'s Story ' , 37707417 ],
                [ ' Giant ' , 23794409 ],
                [ ' The Ten Commandments 1966 ' , 52500000 ],
                [ ' The Quiet Man ' , 5850377 ],
                [ ' Three Cions in the Fountain ' , 10300000 ],
    
            ],
            pointStart:216
        }, {
            type: 'line',
            name: 'System Rating G',
            data: [
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288
            ],
            pointStart: 216,
            color: '#4d0909'
        }]
    });
    

    Analysis

    Blueprint: Movies that did not make any Profit.¶

    This is the blueprint for creating the seventh visualization, Movies that made Profit. Highcharts will be used to create this graph.

    Blueprint:

    • The graph used for this visualzation is Highcharts Lollipop series which is found in the Highcharts Demos. Lollipop charts are variants of column charts, with a circle marker for the data value and a line extending to the axis.The first approach to this chart is by understanding the format of the script. This graph has two types of code HTML and Javascript. JupyterLab has magic commands that supports HTML and Javascript, it uses the '%' syntax element for magics.

      Before the HTML and Javascript are scripted a dataframe needs to be made to extract all the movies that made profit in the action and adventure genre. The two variable needed from the parent dataframe 'all_drama_info1' is the name and profit made in integer.

      • HTML Section:

        To start off the magic command '%%HTML' should be scripted first in the cell. The main scripted lines that will be used to create highchart graph through out this project are; For this particular graph this script will be used to get lollipop affect; To close off the HTML script, this script is used and to indetify and differenate this claa from other, the div id has to be named, and if thei graph is being used more than once the div id needs to have a different name at all times or it will not work;

      • Javascript Section:

        To start off, the magic command '%%JS' is used to script javascript in the jupyterlab cell. The javascript section will have comments and explaination, but for futher information go to Highcharts Demos.The normal layout is horizontal but for this graph its going to be vertical. The main javascript used to create this lollipop graph is; name: ' The Sound of Music ' low: 2303109231 The name variable is where the name of the movies is put and the low variable is the amount of profit made in integer is put.

    This is the 'Drama_DataFrame' dataframe.

    In [452]:
    Drama_DataFrame
    
    Out[452]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 180047784 $180,047,784 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
    1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 142634358 $142,634,358 -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
    2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 693698673 $693,698,673 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
    3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 449948323 $449,948,323 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
    4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 634454789 $634,454,789 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    301 A Dirty Shame September 24, 2004 Drama NC-17 15000000.0 $15,000,000 1339668 $1,339,668 574498.0 $574,498 1914166 $1,914,166 -13085834.0 $-13,085,834 191417 191,417 84.0 5.1 Killer Films Suzanne Shepherd John Waters John Waters
    302 Young Adam April 16, 2004 Drama NC-17 6400000.0 $6,400,000 767373 $767,373 1794447.0 $1,794,447 2561820 $2,561,820 -3838180.0 $-3,838,180 256182 256,182 98.0 6.4 Recorded Picture Company Tilda Swinton David Mackenzie \tDavid Mackenzie
    303 Whore 1991 October 4, 1991 Drama NC-17 50000.0 $50,000 0 $0 0.0 $0 1008404 $1,008,404 958404.0 $958,404 100840 100,840 80.0 5.5 Cheap Date Theresa Russell Ken Russell Deborah Dalton
    304 Ma Mère May 13, 2005 Drama NC-17 3259572.0 $3,259,572 71616 $71,616 950532.0 $950,532 1022148 $1,022,148 -2237424.0 $-2,237,424 102215 102,215 110.0 5.0 Gemini Films Louis Garrel Christophe Honoré Christophe Honoré
    305 Law of Desire April 3, 1987 Drama NC-17 612072.0 $612,072 0 $0 0.0 $0 1470809 $1,470,809 858737.0 $858,737 147081 147,081 82.0 7.1 El Deseo Antonio Banderas Pedro Almodóvar Pedro Almodóvar

    306 rows × 22 columns

    Getting the 'Cost' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe

    In [476]:
    bud_loss = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        if x < 0:bud_loss.append(Drama_DataFrame.Production_Budget[i])
    print(bud_loss)
    
    [150000000.0, 68000000.0, 60000000.0, 50000000.0, 50000000.0, 40000000.0, 40000000.0, 35000000.0, 31000000.0, 30000000.0, 27500000.0, 25000000.0, 22000000.0, 21000000.0, 20000000.0, 18000000.0, 18000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 10000000.0, 7000000.0, 5000000.0, 4500000.0, 4357373.0, 4000000.0, 1000000.0, 1000000.0, 250000.0, 5100000.0, 72000000.0, 65000000.0, 9000000.0, 11000000.0, 45000000.0, 15000000.0, 10000000.0, 27000000.0, 25000000.0, 34000000.0, 15000000.0, 28300000.0, 8000000.0, 9000000.0, 15000000.0, 5000000.0, 4500000.0, 8000000.0, 16000000.0, 45000000.0, 5000000.0, 2734384.0, 35446775.0, 8600000.0, 18000000.0, 4400000.0, 17000000.0, 26000000.0, 6500000.0, 22000000.0, 90000000.0, 17000000.0, 300000.0, 3000000.0, 45000000.0, 10000000.0, 19000000.0, 1000000.0, 4700000.0, 3000000.0, 700000.0, 3200000.0, 1300000.0, 15000000.0, 6400000.0, 3259572.0]
    

    Getting the 'Revenue' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe

    In [477]:
    rev_loss = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        if x < 0:rev_loss.append(Drama_DataFrame.Worldwide_Gross[i])
    print(rev_loss)
    
    [142634358, 54462971, 47818913, 41642166, 26387039, 16340767, 31124367, 24687524, 15826984, 16481405, 15815509, 6792768, 4065020, 5046038, 3727746, 14189810, 7680250, 7719630, 8217571, 7585011, 11173718, 528731, 11831131, 2179623, 382946, 2821010, 1027760, 1200000, 679482, 852399, 354836, 62375, 534816, 37306334, 43545364, 3438735, 8526288, 35656130, 3987768, 7025496, 14859394, 10769960, 32255440, 2819485, 14920781, 3281232, 6668025, 199078, 4786789, 2044892, 2400000, 1705908, 20350754, 496059, 1022148, 195494, 1025228, 8721243, 40300, 10015449, 636796, 2447576, 9171289, 69131860, 10015449, 108998, 592861, 37750754, 4659110, 1236844, 205569, 2094302, 2783535, 103093, 690872, 627287, 1914166, 2561820, 1022148]
    

    Getting the 'Amount of Money Lost' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe

    In [478]:
    money_loss = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        if x < 0:money_loss.append(Drama_DataFrame.Profit[i])
    print(money_loss)
    
    [-7365642.0, -13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0, -11684491.0, -18207232.0, -17934980.0, -15953962.0, -16272254.0, -3810190.0, -10319750.0, -10280370.0, -7782429.0, -8414989.0, -3826282.0, -14471269.0, -1168869.0, -7820377.0, -6617054.0, -2178990.0, -3472240.0, -3157373.0, -3320518.0, -147601.0, -645164.0, -187625.0, -4565184.0, -34693666.0, -21454636.0, -5561265.0, -2473712.0, -9343870.0, -11012232.0, -2974504.0, -12140606.0, -14230040.0, -1744560.0, -12180515.0, -13379219.0, -4718768.0, -2331975.0, -14800922.0, -213211.0, -2455108.0, -5600000.0, -14294092.0, -24649246.0, -4503941.0, -1712236.0, -35251281.0, -7574772.0, -9278757.0, -4359700.0, -6984551.0, -25363204.0, -4052424.0, -12828711.0, -20868140.0, -6984551.0, -191002.0, -2407139.0, -7249246.0, -5340890.0, -17763156.0, -794431.0, -2605698.0, -216465.0, -596907.0, -2509128.0, -672713.0, -13085834.0, -3838180.0, -2237424.0]
    

    Getting the 'Names' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe

    In [481]:
    name_loss = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        if x < 0:name_loss.append(Drama_DataFrame.Movie[i])
    print(name_loss)
    
    ['The Wolfman', 'Downsizing', 'Trouble with the Curve', 'Dream House', 'Upside Down', 'Paranoia', 'Victor Frankenstein', 'Biutiful', 'Extraordinary Measures', 'The Space Between Us', 'Anonymous', 'Tulip Fever', 'Stone', 'The Beaver', 'By the Sea', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Chloe', 'Coriolanus', 'Hesher', 'Everything Must Go', 'Maggie', 'Anna', 'Stake Land', 'I Origins', 'The Invitation', 'The Canyons', 'Cattle Annie and Little Britches', 'The Majestic', 'We Are Marshall', 'The Ultimate Gift', 'What If...', 'The Indian in the Cupboard', 'Fluke', 'Three Wishes', 'Music of the Heart', 'Gettysburg', 'The Age of Innocence', 'Newsies', 'Ragtime', 'Looker', 'Six Weeks', 'Five Days One Summer', 'Eddie and the Cruisers', 'Testament', 'Table for Five', 'Man, Woman and Child', 'Showgirls', 'Bent', 'Ma mère', 'La traviata', 'Little Dorrit', 'The Secret Garden', 'Through the Olive Trees', 'A Little Princess', 'One from the Heart', 'The Hand', 'Pennies from Heaven', 'Babe: Pig in the City', 'A Little Princess', 'Before the Wrath', 'Miracle of Marcelino', 'Showgirls', 'Killer Joe', 'Queen of Hearts', 'Man Bites Dog', 'Nymphomaniac: Vol. I', 'Frontier(s)', 'Chained', 'The Big Feast', 'Orgazmo', 'A Dirty Shame', 'Young Adam', 'Ma Mère']
    

    Getting the index of the movies that have a Budget that is between $0 to $8 million

    In [536]:
    stor1 = []
    for i,x in enumerate(bud_loss):
        if  x <= 8000000:stor1.append(i)
    len(stor1)
    
    Out[536]:
    27

    Using the index from the variable 'stor1' to get the name of the movies that have a Budget that is between $0 to $8 million

    In [552]:
    n1 = []
    for i in stor1:
        n1.append(name_loss[i])
    print(n1)
    
    ['Hesher', 'Everything Must Go', 'Maggie', 'Anna', 'Stake Land', 'I Origins', 'The Invitation', 'The Canyons', 'Cattle Annie and Little Britches', 'Looker', 'Eddie and the Cruisers', 'Testament', 'Table for Five', 'Bent', 'Ma mère', 'Through the Olive Trees', 'The Hand', 'Before the Wrath', 'Miracle of Marcelino', 'Man Bites Dog', 'Nymphomaniac: Vol. I', 'Frontier(s)', 'Chained', 'The Big Feast', 'Orgazmo', 'Young Adam', 'Ma Mère']
    

    Using the index from the variable 'stor1' to get the budget of the movies that have a Budget that is between $0 to $8 million

    In [553]:
    b1 = []
    for i in stor1:
        b1.append(bud_loss[i])
    print(b1)
    
    [7000000.0, 5000000.0, 4500000.0, 4357373.0, 4000000.0, 1000000.0, 1000000.0, 250000.0, 5100000.0, 8000000.0, 5000000.0, 4500000.0, 8000000.0, 5000000.0, 2734384.0, 4400000.0, 6500000.0, 300000.0, 3000000.0, 1000000.0, 4700000.0, 3000000.0, 700000.0, 3200000.0, 1300000.0, 6400000.0, 3259572.0]
    

    Using the index from the variable 'stor1' to get the revenue of the movies that have a Budget that is between $0 to $8 million

    In [554]:
    r1 = []
    for i in stor1:
        r1.append(rev_loss[i])
    print(r1)
    
    [382946, 2821010, 1027760, 1200000, 679482, 852399, 354836, 62375, 534816, 3281232, 4786789, 2044892, 2400000, 496059, 1022148, 40300, 2447576, 108998, 592861, 205569, 2094302, 2783535, 103093, 690872, 627287, 2561820, 1022148]
    

    Using the index from the variable 'stor1' to get the amount of money lost of the movies that have a Budget that is between $0 to $8 million

    In [555]:
    l1 = []
    for i in stor1:
        l1.append(money_loss[i])
    print(l1)
    
    [-6617054.0, -2178990.0, -3472240.0, -3157373.0, -3320518.0, -147601.0, -645164.0, -187625.0, -4565184.0, -4718768.0, -213211.0, -2455108.0, -5600000.0, -4503941.0, -1712236.0, -4359700.0, -4052424.0, -191002.0, -2407139.0, -794431.0, -2605698.0, -216465.0, -596907.0, -2509128.0, -672713.0, -3838180.0, -2237424.0]
    

    Getting the index of the movies that have a Budget that is between $8 to $21 million

    In [550]:
    stor2 = []
    for i,x in enumerate(bud_loss):
        if 8000000 < x <= 21000000:stor2.append(i)
    len(stor2)
    
    Out[550]:
    26

    Using the index from the variable 'stor2' to get the name of the movies that have a Budget that is between $8 to $21 million

    In [566]:
    n2 = []
    for i in stor2:
        n2.append(name_loss[i])
    print(n2)
    
    ['The Beaver', 'By the Sea', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Chloe', 'Coriolanus', 'The Ultimate Gift', 'What If...', 'Fluke', 'Three Wishes', 'Newsies', 'Six Weeks', 'Five Days One Summer', 'Man, Woman and Child', 'Little Dorrit', 'The Secret Garden', 'A Little Princess', 'A Little Princess', 'Killer Joe', 'Queen of Hearts', 'A Dirty Shame']
    

    Using the index from the variable 'stor2' to get the budget of the movies that have a Budget that is between $8 to $21 million

    In [567]:
    b2 = []
    for i in stor2:
        b2.append(bud_loss[i])
    print(b2)
    
    [21000000.0, 20000000.0, 18000000.0, 18000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 10000000.0, 9000000.0, 11000000.0, 15000000.0, 10000000.0, 15000000.0, 9000000.0, 15000000.0, 16000000.0, 8600000.0, 18000000.0, 17000000.0, 17000000.0, 10000000.0, 19000000.0, 15000000.0]
    

    Using the index from the variable 'stor2' to get the revenue of the movies that have a Budget that is between $8 to $21 million

    In [568]:
    r2 = []
    for i in stor2:
        r2.append(rev_loss[i])
    print(r2)
    
    [5046038, 3727746, 14189810, 7680250, 7719630, 8217571, 7585011, 11173718, 528731, 11831131, 2179623, 3438735, 8526288, 3987768, 7025496, 2819485, 6668025, 199078, 1705908, 1025228, 8721243, 10015449, 10015449, 4659110, 1236844, 1914166]
    

    Using the index from the variable 'stor2' to get the amount of money lost of the movies that have a Budget that is between $8 to $21 million

    In [569]:
    l2 = []
    for i in stor2:
        l2.append(money_loss[i])
    print(l2)
    
    [-15953962.0, -16272254.0, -3810190.0, -10319750.0, -10280370.0, -7782429.0, -8414989.0, -3826282.0, -14471269.0, -1168869.0, -7820377.0, -5561265.0, -2473712.0, -11012232.0, -2974504.0, -12180515.0, -2331975.0, -14800922.0, -14294092.0, -7574772.0, -9278757.0, -6984551.0, -6984551.0, -5340890.0, -17763156.0, -13085834.0]
    

    Getting the index of the movies that have a Budget that is above $21 million

    In [551]:
    stor3 = []
    for i,x in enumerate(bud_loss):
        if x > 21000000:stor3.append(i)
    len(stor3)
    
    Out[551]:
    26

    Using the index from the variable 'stor3' to get the name of the movies that have a Budget that is above $21 million

    In [591]:
    n3 = []
    for i in stor3:
        n3.append(name_loss[i])
    print(n3)
    
    ['The Wolfman', 'Downsizing', 'Trouble with the Curve', 'Dream House', 'Upside Down', 'Paranoia', 'Victor Frankenstein', 'Biutiful', 'Extraordinary Measures', 'The Space Between Us', 'Anonymous', 'Tulip Fever', 'Stone', 'The Majestic', 'We Are Marshall', 'The Indian in the Cupboard', 'Music of the Heart', 'Gettysburg', 'The Age of Innocence', 'Ragtime', 'Showgirls', 'La traviata', 'One from the Heart', 'Pennies from Heaven', 'Babe: Pig in the City', 'Showgirls']
    

    Using the index from the variable 'stor3' to get the budget of the movies that have a Budget that is above $21 million

    In [592]:
    b3 = []
    for i in stor3:
        b3.append(bud_loss[i])
    print(b3)
    
    [150000000.0, 68000000.0, 60000000.0, 50000000.0, 50000000.0, 40000000.0, 40000000.0, 35000000.0, 31000000.0, 30000000.0, 27500000.0, 25000000.0, 22000000.0, 72000000.0, 65000000.0, 45000000.0, 27000000.0, 25000000.0, 34000000.0, 28300000.0, 45000000.0, 35446775.0, 26000000.0, 22000000.0, 90000000.0, 45000000.0]
    

    Using the index from the variable 'stor3' to get the revenue of the movies that have a Budget that is above $21 million

    In [593]:
    r3 = []
    for i in stor3:
        r3.append(rev_loss[i])
    print(r3)
    
    [142634358, 54462971, 47818913, 41642166, 26387039, 16340767, 31124367, 24687524, 15826984, 16481405, 15815509, 6792768, 4065020, 37306334, 43545364, 35656130, 14859394, 10769960, 32255440, 14920781, 20350754, 195494, 636796, 9171289, 69131860, 37750754]
    

    Using the index from the variable 'stor3' to get the amount of money lost of the movies that have a Budget that is above $21 million

    In [594]:
    l3 = []
    for i in stor3:
        l3.append(money_loss[i])
    print(l3)
    
    [-7365642.0, -13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0, -11684491.0, -18207232.0, -17934980.0, -34693666.0, -21454636.0, -9343870.0, -12140606.0, -14230040.0, -1744560.0, -13379219.0, -24649246.0, -35251281.0, -25363204.0, -12828711.0, -20868140.0, -7249246.0]
    

    This is 'Part One' of the HTML Script from Highcharts Libaray to visualize the data of the Losses of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Series'. This will be done using Javascript and HTML below and will be saved in an .png file.

    In [69]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibilty.js"></script>
    
    <figure class="highcharts-figure">
        <div id="w"></div> 
        <p class="highcharts-description">
        </p>
    </figure>
    

    In [70]:
    %%js
    Highcharts.chart('w',{
        chart:{
            type:'column',
            height:400,
            width:700
        },
        title:{
            text:'Movies That Did Not Make Any Profit'
        },
        xAxis:{
            categories:['Hesher', 'Everything Must Go', 'Maggie', 'Anna', 'Stake Land', 'I Origins', 'The Invitation', 'The Canyons', 'Cattle Annie and Little Britches', 
                        'Looker', 'Eddie and the Cruisers', 'Testament', 'Table for Five', 'Bent', 'Ma mère', 'Through the Olive Trees', 'The Hand', 'Before the Wrath', 
                        'Miracle of Marcelino', 'Man Bites Dog', 'Nymphomaniac: Vol. I', 'Frontier(s)', 'Chained', 'The Big Feast', 'Orgazmo', 'Young Adam', 'Ma Mère'],
            labels:{
               enabled:false 
            }
        },
        credits:{
            enabled:false
        },
        colors:['red','#900505','#FF5F84'],
        yAxis:{
            min:-6000000,
            max:8000000,
            step:500000,
        },
        series:[{
            name:'Cost',
            data:[7000000.0, 5000000.0, 4500000.0, 4357373.0, 4000000.0, 1000000.0, 1000000.0, 250000.0, 5100000.0, 8000000.0, 5000000.0, 
                  4500000.0, 8000000.0, 5000000.0, 2734384.0, 4400000.0, 6500000.0, 300000.0, 3000000.0, 1000000.0, 4700000.0, 3000000.0,
                  700000.0, 3200000.0, 1300000.0, 6400000.0, 3259572.0 ]
        },{
            name:'Loss',
            data:[-6617054.0, -2178990.0, -3472240.0, -3157373.0, -3320518.0, -147601.0, -645164.0, -187625.0, -4565184.0, -4718768.0, 
                  -213211.0, -2455108.0, -5600000.0, -4503941.0, -1712236.0, -4359700.0, -4052424.0, -191002.0, -2407139.0, -794431.0, 
                  -2605698.0, -216465.0, -596907.0, -2509128.0, -672713.0, -3838180.0, -2237424.0]
        },{
            name:'Revenue',
            data:[382946, 2821010, 1027760, 1200000, 679482, 852399, 354836, 62375, 534816, 3281232, 4786789, 2044892, 2400000, 496059, 
                  1022148, 40300, 2447576, 108998, 592861, 205569, 2094302, 2783535, 103093, 690872, 627287, 2561820, 1022148]
        }]
    });
    

    This is 'Part Two' of the HTML Script from Highcharts Libaray to visualize the data of the Losses of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Series'. This will be done using Javascript and HTML below and will be saved in an .png file.

    In [67]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibilty.js"></script>
    
    <figure class="highcharts-figure">
        <div id="f"></div> 
        <p class="highcharts-description">
        </p>
    </figure>
    

    In [68]:
    %%js
    Highcharts.chart('f',{
        chart:{
            type:'column',
            height:400,
            width:700
        },
        title:{
            text:'Movies That Did Not Make Any Profit'
        },
        xAxis:{
            categories:['The Beaver', 'By the Sea', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 
                        'The Reluctant Fundamentalist', 'Chloe', 'Coriolanus', 'The Ultimate Gift', 'What If...', 'Fluke', 'Three Wishes', 'Newsies', 
                        'Six Weeks', 'Five Days One Summer', 'Man, Woman and Child', 'Little Dorrit', 'The Secret Garden', 'A Little Princess', 
                        'A Little Princess', 'Killer Joe', 'Queen of Hearts', 'A Dirty Shame'
                    ],
            labels:{
               enabled:false
            }},
        yAxis:{
            labels:{
                enabled:true
            }
        },
        credits:{
            enabled:false
        },
        colors:['red','#900505 ','#FF5F84'],
        yAxis:{
            min:-15000000,
            max:25000000,
            step:5000000,
        },
        series:[{
            name:'Cost',
            data: [21000000.0, 20000000.0, 18000000.0, 18000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 10000000.0, 
                   9000000.0, 11000000.0, 15000000.0, 10000000.0, 15000000.0, 9000000.0, 15000000.0, 16000000.0, 8600000.0, 18000000.0, 17000000.0, 
                   17000000.0, 10000000.0, 19000000.0, 15000000.0 ]
        },{
            name:'Loss',
            data:[-7365642.0, -13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0, 
                  -11684491.0, -18207232.0, -17934980.0, -34693666.0, -21454636.0, -9343870.0, -12140606.0, -14230040.0, -1744560.0, -13379219.0, 
                  -24649246.0, -35251281.0, -25363204.0, -12828711.0, -20868140.0, -7249246.0]
        },{
            name:'Revenue',
            data:[5046038, 3727746, 14189810, 7680250, 7719630, 8217571, 7585011, 11173718, 528731, 11831131, 2179623, 3438735, 8526288, 3987768, 7025496, 
                  2819485, 6668025, 199078, 1705908, 1025228, 8721243, 10015449, 10015449, 4659110, 1236844, 1914166]
        }]
    });
    

    This is 'Part Three' of the HTML Script from Highcharts Libaray to visualize the data of the Losses of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Series'. This will be done using Javascript and HTML below and will be saved in an .png file.

    In [65]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibilty.js"></script>
    
    <figure class="highcharts-figure">
        <div id="k"></div> 
        <p class="highcharts-description">
        </p>
    </figure>
    

    In [66]:
    %%js
    Highcharts.chart('k',{
        chart:{
            type:'column',
            height:400,
            width:700
        },
        title:{
            text:'Movies That Did Not Make Any Profit'
        },
        xAxis:{
            categories:['Downsizing', 'Trouble with the Curve', 'Dream House', 'Upside Down', 'Paranoia', 'Victor Frankenstein', 'Biutiful', 'Extraordinary Measures', 
                        'The Space Between Us', 'Anonymous', 'Tulip Fever', 'Stone', 'The Majestic', 'We Are Marshall', 'The Indian in the Cupboard', 'Music of the Heart', 'Gettysburg', 
                        'The Age of Innocence', 'Ragtime', 'Showgirls', 'La traviata', 'One from the Heart', 'Pennies from Heaven', 'Babe: Pig in the City', 'Showgirls'
                    ],
            labels:{
               enabled:false
            }},
        yAxis:{
            labels:{
                enabled:true
            }
        },
        credits:{
            enabled:false
        },
        colors:['red','#900505 ','#FF5F84'],
        yAxis:{
            min:-35000000,
            max:70000000,
            step:500000,
        },
        series:[{
            name:'Cost',
            data: [68000000.0, 60000000.0, 50000000.0, 50000000.0, 40000000.0, 40000000.0, 35000000.0, 31000000.0, 30000000.0, 
                   27500000.0, 25000000.0, 22000000.0, 72000000.0, 65000000.0, 45000000.0, 27000000.0, 25000000.0, 34000000.0, 28300000.0, 
                   45000000.0, 35446775.0, 26000000.0, 22000000.0, 90000000.0, 45000000.0]
        },{
            name:'Loss',
            data:[-13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0, 
                  -11684491.0, -18207232.0, -17934980.0, -34693666.0, -21454636.0, -9343870.0, -12140606.0, -14230040.0, -1744560.0, -13379219.0, 
                  -24649246.0, -35251281.0, -25363204.0, -12828711.0, -20868140.0, -7249246.0 ]
        },{
            name:'Revenue',
            data:[54462971, 47818913, 41642166, 26387039, 16340767, 31124367, 24687524, 15826984, 16481405, 15815509, 6792768, 4065020, 37306334, 
                  43545364, 35656130, 14859394, 10769960, 32255440, 14920781, 20350754, 195494, 636796, 9171289, 69131860, 37750754]
        }]
    });
    

    Blueprint: Top 20 Highest Profitable Movies and Top 20 Lowest Profitable Movies¶

    This is the blueprint for creating the ninth and tenth visualization, Top 20 Highest Profitable Movies and the Top 20 Lowest Profitable Movies. Highcharts will be used to create these garphs.

    Blueprint:

    • The graph used for these visualization is Highcharts 3D Cylinder Chart for the Top 20 Highest Profitable Movies and Highcharts Chart Rotation Chart Series for thr Top 20 lowest Profitable Movies, which is all found in the highcharts demos. The chart rotation chart series have an Alpha, Beta and Depth angle which can be adjusted to rotate the graph around.

      The first approach to this chart is by understanding the format of the code script. Like the previous graph this chart is constructed by two different types of code HTML and Javascript. The HTML section is very similar to the previous graph but the only difference is this HTML code; This allows the graphs to be 3D and it also allows it to be cylinder. To close of two script is also similar to the previous script but with a unique name to it self.

      This is for the cylinder graph;

      This if for the 3D column graph;

    The second approach is scripting the javascript section. The javascript section is very simple, brfore that there needs to be some data extracted from the parent dataframe. These are the elements needed to be extracted for this visualazation. For the Top 20 Highest Profitable Movies the name and amount of profit made is needed, for the Top 20 Lowest Profitable Movies the name and amount of profit made in integer of all the movies is needed in that category.

    There is just one sub-section in each graph that are particular, the data sub-section in the series section. It is a body of sequenences of list that consist of the name and amount of profit made in integer.

    This is from the first graph; ['Avatar|1st Highest',2351345279], ['The Sound of Music|2nd Highest', 2303109231], ['Black Panther|3rd Highest',1148258224]

    This is from the second graph; [' Miami Vice|170th Highest', 28818556 ], [' Space Chimps|171th Highest', 28097693 ], [' The Tale of Despereaux|172th Highest', 26957280 ],

    Using a for loop to put the top 20 highest profitable movie name and profit in html code which will be pasted in the cell below.

    This is the 'Drama_DataFrame' dataframe.

    In [628]:
    Drama_DataFrame
    
    Out[628]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 180047784 $180,047,784 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
    1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 142634358 $142,634,358 -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
    2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 693698673 $693,698,673 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
    3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 449948323 $449,948,323 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
    4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 634454789 $634,454,789 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    301 A Dirty Shame September 24, 2004 Drama NC-17 15000000.0 $15,000,000 1339668 $1,339,668 574498.0 $574,498 1914166 $1,914,166 -13085834.0 $-13,085,834 191417 191,417 84.0 5.1 Killer Films Suzanne Shepherd John Waters John Waters
    302 Young Adam April 16, 2004 Drama NC-17 6400000.0 $6,400,000 767373 $767,373 1794447.0 $1,794,447 2561820 $2,561,820 -3838180.0 $-3,838,180 256182 256,182 98.0 6.4 Recorded Picture Company Tilda Swinton David Mackenzie \tDavid Mackenzie
    303 Whore 1991 October 4, 1991 Drama NC-17 50000.0 $50,000 0 $0 0.0 $0 1008404 $1,008,404 958404.0 $958,404 100840 100,840 80.0 5.5 Cheap Date Theresa Russell Ken Russell Deborah Dalton
    304 Ma Mère May 13, 2005 Drama NC-17 3259572.0 $3,259,572 71616 $71,616 950532.0 $950,532 1022148 $1,022,148 -2237424.0 $-2,237,424 102215 102,215 110.0 5.0 Gemini Films Louis Garrel Christophe Honoré Christophe Honoré
    305 Law of Desire April 3, 1987 Drama NC-17 612072.0 $612,072 0 $0 0.0 $0 1470809 $1,470,809 858737.0 $858,737 147081 147,081 82.0 7.1 El Deseo Antonio Banderas Pedro Almodóvar Pedro Almodóvar

    306 rows × 22 columns

    Creating a list of all the 'Movies Profits' from 'Drama_DataFrame' dataframe.

    In [736]:
    profit_all = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        profit_all.append(x)
    

    Creating a list of all the 'Movies Names' from 'Drama_DataFrame' dataframe.

    In [631]:
    name_all = []
    for i,x in enumerate(Drama_DataFrame.Movie):
        name_all.append(x)
    

    Creating a list of all the 'Movies System Ratings' from 'Drama_DataFrame' dataframe.

    In [633]:
    system_ratings =[]
    for i,x in enumerate(Drama_DataFrame.Rating):
        system_ratings.append(x)
    

    Creating tuples consisting of the 'Movies Names, Profits and System Rating' put together which will then be in a list

    In [635]:
    all_to = []
    for i,x in enumerate(profit_all):all_to.append((name_all[i],x,system_ratings[i]))
    

    After creating the 'all_to' list, the list will be sorted by the 'Profit' of each movie going in decending order. And prinitng out the 'Top 20 Highest Profitable Movies'

    In [636]:
    all_to.sort(key=lambda i:i[1],reverse=True)
    print(all_to[:20])
    
    [('The Lion King 1994', 941214868.0, 'G'), ('Gravity', 583698673.0, 'PG-13'), ('Sing', 559454789.0, 'PG-13'), ('Tex', 544368315.0, 'PG'), ('Fifty Shades of Grey', 530998101.0, 'R'), ('Cinderella', 447351353.0, 'PG'), ('Beauty and the Beast 1991', 418656843.0, 'G'), ('Django Unchained', 349948323.0, 'R'), ('Fifty Shades Darker', 326398492.0, 'R'), ('Black Swan', 318266710.0, 'R'), ('A Quiet Place', 317522294.0, 'PG-13'), ('Fifty Shades Freed', 316350619.0, 'R'), ('Gone Girl', 307567189.0, 'R'), ('The Secret Garden', 293281000.0, 'G'), ('Wonder', 285937718.0, 'PG'), ('Wonder', 284604712.0, 'PG'), ('The Sound of Music', 278014195.0, 'G'), ('Bambi 1942', 267142000.0, 'G'), ('The Hunchback of Notre Drame', 255500000.0, 'G'), ('True Grit', 217276928.0, 'PG-13')]
    

    Getting the 'Profit' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [659]:
    twenty_num = []
    for i in all_to[:20]:twenty_num.append(i[1])
    print(twenty_num)
    
    [941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 418656843.0, 349948323.0, 326398492.0, 318266710.0, 317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
    

    Getting the 'Names' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [700]:
    twenty_name = []
    for i in all_to[:20]:twenty_name.append(i[0])
    print(twenty_name)
    
    ['The Lion King 1994', 'Gravity', 'Sing', 'Tex', 'Fifty Shades of Grey', 'Cinderella', 'Beauty and the Beast 1991', 'Django Unchained', 'Fifty Shades Darker', 'Black Swan', 'A Quiet Place', 'Fifty Shades Freed', 'Gone Girl', 'The Secret Garden', 'Wonder', 'Wonder', 'The Sound of Music', 'Bambi 1942', 'The Hunchback of Notre Drame', 'True Grit']
    

    Getting the 'System Ratigs' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [701]:
    twenty_rat = []
    for i in all_to[:20]:twenty_rat.append(i[2])
    print(twenty_rat)
    
    ['G', 'PG-13', 'PG-13', 'PG', 'R', 'PG', 'G', 'R', 'R', 'R', 'PG-13', 'R', 'R', 'G', 'PG', 'PG', 'G', 'G', 'G', 'PG-13']
    

    After creating the 'all_to' list, the list will be sorted by the 'Profit' of each movie going in decending order. And prinitng out the 'Top 20 Lowest Profitable Movies'

    In [752]:
    all_to.sort(key=lambda i:i[1],reverse=True)
    print(all_to[-99:-79])
    
    [('Two Girls and a Guy', 1315026.0, 'NC-17'), ('Pollyanna', 1250000.0, 'G'), ('Rabbit Hole', 1205034.0, 'PG-13'), ('Bad Lieutenant', 1038916.0, 'NC-17'), ('Whore 1991', 958404.0, 'NC-17'), ('Law of Desire', 858737.0, 'NC-17'), ('Wide Sargasso Sea', 659312.0, 'NC-17'), ('Zoot Suit', 556082.0, 'R'), ('Pink Flamingos', 401802.0, 'NC-17'), ('The Dreamers', 307113.0, 'NC-17'), ('Sound of My Voice', 294448.0, 'R'), ('Tokyo Decadence', 257845.0, 'NC-17'), ('Elles', 256669.0, 'NC-17'), ('Take Shelter', 222016.0, 'R'), ('Palo Alto', 156309.0, 'R'), ('The Dreamers', 121165.0, 'NC-17'), ('Locke', 88390.0, 'R'), ('Hugo', 47784.0, 'PG'), ('Stoker', 34913.0, 'R'), ('Whore', 8404.0, 'NC-17')]
    

    Getting the 'Profit' of the movies that are the 'Top 20 Lowest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [707]:
    twenty_num1 = []
    for i in all_to[-99:-79]:twenty_num1.append(i[1])
    print(twenty_num1)
    
    [1315026.0, 1250000.0, 1205034.0, 1038916.0, 958404.0, 858737.0, 659312.0, 556082.0, 401802.0, 307113.0, 294448.0, 257845.0, 256669.0, 222016.0, 156309.0, 121165.0, 88390.0, 47784.0, 34913.0, 8404.0]
    

    Getting the 'Names' of the movies that are the 'Top 20 Lowest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [708]:
    twenty_name1 = []
    for i in all_to[-99:-79]:twenty_name1.append(i[0])
    print(twenty_name1)
    
    ['Two Girls and a Guy', 'Pollyanna', 'Rabbit Hole', 'Bad Lieutenant', 'Whore 1991', 'Law of Desire', 'Wide Sargasso Sea', 'Zoot Suit', 'Pink Flamingos', 'The Dreamers', 'Sound of My Voice', 'Tokyo Decadence', 'Elles', 'Take Shelter', 'Palo Alto', 'The Dreamers', 'Locke', 'Hugo', 'Stoker', 'Whore']
    

    Getting the 'System Ratigs' of the movies that are the 'Top 20 Lowest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [709]:
    twenty_rat1 = []
    for i in all_to[-99:-79]:twenty_rat1.append(i[2])
    print(twenty_rat1)
    
    ['NC-17', 'G', 'PG-13', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'R', 'NC-17', 'NC-17', 'R', 'NC-17', 'NC-17', 'R', 'R', 'NC-17', 'R', 'PG', 'R', 'NC-17']
    

    Matching colors based in HTML to the System Rating within the 'Top 20 Highest Profitable Movies', for the graph below.

    In [717]:
    color1 = []
    for i in all_to[-99:-79]:
        if i[2] == 'NC-17':color1.append("#DC143C")
        if i[2] == 'R':color1.append("#8B0000")
        if i[2] == 'PG':color1.append("#CD5C5C")
        if i[2] == 'PG-13':color1.append("#FAB072")
        if i[2] == 'G':color1.append("#A45A52")
    print(color1)
    
    ['#DC143C', '#A45A52', '#FAB072', '#DC143C', '#DC143C', '#DC143C', '#DC143C', '#8B0000', '#DC143C', '#DC143C', '#8B0000', '#DC143C', '#DC143C', '#8B0000', '#8B0000', '#DC143C', '#8B0000', '#CD5C5C', '#8B0000', '#DC143C']
    

    Matching colors based in HTML to the System Rating within the 'Top 20 Lowest Profitable Movies', for the graph below.

    In [718]:
    color2 = []
    for i in all_to[:20]:
        if i[2] == 'NC-17':color2.append("#DC143C")
        if i[2] == 'R':color2.append("#8B0000")
        if i[2] == 'PG':color2.append("#CD5C5C")
        if i[2] == 'PG-13':color2.append("#FAB072")
        if i[2] == 'G':color2.append("#A45A52")
    print(color2)
    
    ['#A45A52', '#FAB072', '#FAB072', '#CD5C5C', '#8B0000', '#CD5C5C', '#A45A52', '#8B0000', '#8B0000', '#8B0000', '#FAB072', '#8B0000', '#8B0000', '#A45A52', '#CD5C5C', '#CD5C5C', '#A45A52', '#A45A52', '#A45A52', '#FAB072']
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a '3D Cylinder Series'. This will be done using Javascript and HTML below.

    In [61]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/highcharts-3d.js"></script>
    <script src="https://code.highcharts.com/modules/cylinder.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    <figure class="highcharts-figure">
        <div id="pete1"></div>
    </figure>
    
    In [62]:
    %%js
    Highcharts.chart('pete1', {
        chart: {
            width:650,
            height:500,
            type: 'cylinder',
            options3d: {
                enabled: true,
                alpha: 15,
                beta: 15,
                depth: 50,
                viewDistance: 25
            }
        },
        title: {
            text: 'Top 20 Highest Profitable Movies'
        },
        plotOptions: {
            series: {
                depth: 25,
                colorByPoint: false,
                color: "#EC5800",
    
            }
        },
        xAxis: {
            categories:['The Lion King 1994', 'Gravity', 'Sing', 'Tex', 'Fifty Shades of Grey', 'Cinderella', 'Beauty and the Beast 1991', 'Django Unchained', 'Fifty Shades Darker', 
                        'Black Swan', 'A Quiet Place', 'Fifty Shades Freed', 'Gone Girl', 'The Secret Garden', 'Wonder', 'Wonder', 'The Sound of Music', 'Bambi 1942', 
                        'The Hunchback of Notre Drame', 'True Grit'],
            labels: {
                skew3d: true,
                style: {
                    fontSize: '16px'
                }
            }
        },
        series: [{
            data: [941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 418656843.0, 349948323.0, 326398492.0, 318266710.0, 
                   317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0],
            name: 'Profit',
            showInLegend: false
        }]
    });
    function showValues() {
        document.getElementById('alpha-value').innerHTML = chart.options.chart.options3d.alpha;
        document.getElementById('beta-value').innerHTML = chart.options.chart.options3d.beta;
        document.getElementById('depth-value').innerHTML = chart.options.chart.options3d.depth;
    }
    
    // Activate the sliders
    document.querySelectorAll('#sliders input').forEach(input => input.addEventListener('input', e => {
        chart.options.chart.options3d[e.target.id] = parseFloat(e.target.value);
        showValues();
        chart.redraw(false);
    }));
    
    showValues();
    

    This is the HTML Script from Highcharts Libaray to visualize the data of the 'Top 20 Lowest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a '3D Cylinder Series'. This will be done using Javascript and HTML below.

    In [63]:
    %%html
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/highcharts-3d.js"></script>
    <script src="https://code.highcharts.com/modules/cylinder.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    
    <figure class="highcharts-figure">
        <div id="pete2"></div>
    </figure>
    
    In [64]:
    %%js
    Highcharts.chart('pete2', {
        chart: {
            width:650,
            height:500,
            type: 'cylinder',
            options3d: {
                enabled: true,
                alpha: 15,
                beta: 15,
                depth: 50,
                viewDistance: 25
            }
        },
        title: {
            text: 'Top 20 Lowest Profitable Movies'
        },
        plotOptions: {
            series: {
                depth: 25,
                colorByPoint: false,
                color: "#DC143C",
    
            }
        },
        xAxis: {
            categories:['Two Girls and a Guy', 'Pollyanna', 'Rabbit Hole', 'Bad Lieutenant', 'Whore 1991', 'Law of Desire', 'Wide Sargasso Sea', 'Zoot Suit', 'Pink Flamingos', 
                        'The Dreamers', 'Sound of My Voice', 'Tokyo Decadence', 'Elles', 'Take Shelter', 'Palo Alto', 'The Dreamers', 'Locke', 'Hugo', 'Stoker', 'Whore'],
            labels: {
                skew3d: true,
                style: {
                    fontSize: '16px'
                }
            }
        },
        series: [{
            data: [1315026.0, 1250000.0, 1205034.0, 1038916.0, 958404.0, 858737.0, 659312.0, 556082.0, 401802.0, 307113.0, 294448.0, 257845.0, 256669.0, 
                   222016.0, 156309.0, 121165.0, 88390.0, 47784.0, 34913.0, 8404.0],
            name: 'Profit',
            showInLegend: false
        }]
    });
    function showValues() {
        document.getElementById('alpha-value').innerHTML = chart.options.chart.options3d.alpha;
        document.getElementById('beta-value').innerHTML = chart.options.chart.options3d.beta;
        document.getElementById('depth-value').innerHTML = chart.options.chart.options3d.depth;
    }
    
    // Activate the sliders
    document.querySelectorAll('#sliders input').forEach(input => input.addEventListener('input', e => {
        chart.options.chart.options3d[e.target.id] = parseFloat(e.target.value);
        showValues();
        chart.redraw(false);
    }));
    
    showValues();
    

    This is the HTML Script from Highcharts Libaray to visualize the Percentage of each 'System rateing' in 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below.

    In [587]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    
    <figure class="highcharts-figure">
        <div id="pete3"></div> 
    </figure>
    
    In [588]:
    %%js
    Highcharts.chart('pete3', {
        chart: {
            width:650,
            height:500,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: 'Top 20 Highest Profitable Movies'
        },
        tooltip: {
            pointFormat: '{series.name}: <b>{point.percentage:.0f}%</b>'
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.0f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'System Rating',
            colorByPoint: true,
            colors: ['#f24a0c','#b30707','#edBa66','#bf6849'],
            data: [{
                name: 'G',
                y: 30,
                selected: true,
            }, {
                name: 'PG',
                y: 20,
                sliced: true,
            }, {
                name: 'R',
                y: 30,
                sliced: true,
                selected: true
            }, {
                name: 'PG-13',
                y: 20
            }]
        }]
    });
    

    This is the HTML Script from Highcharts Libaray to visualize the Percentage of each 'System rateing' in 'Top 20 Lowest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below.

    In [57]:
    %%HTML
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    
    <figure class="highcharts-figure">
        <div id="pete4"></div> 
    </figure>
    
    In [58]:
    %%js
    Highcharts.chart('pete4', {
        chart: {
            width:650,
            height:500,
            styledMode: false,
            plotBackgroundColor: null,
            plotBorderWidth: null,
            plotShadow: false,
            type: 'pie'
           
        },
        title: {
            text: 'Top 20 Lowest Profitable Movies'
        },
        tooltip: {
            pointFormat: '{series.name}: <b>{point.percentage:.0f}%</b>'
        },
        accessibility: {
            point: {
                valueSuffix: '%'
            }
        },
        plotOptions: {
            pie: {
                allowPointSelect: true,
                cursor: 'pointer',
                dataLabels: {
                    enabled: true,
                    format: '<b>{point.name}</b>: {point.percentage:.0f} %'
                    
                },
                showInLegend: true
            }
        },
        series: [{
            name: 'System Rating',
            colorByPoint: true,
            colors: ['#ff004d','#850a33','#d97e99','#9c6671','#e6a3ba'],
            data: [{
                name: 'Nc-17',
                y: 55
            }, {
                name: 'PG',
                y: 5,
                sliced: true,
                selected: true
            }, {
                name: 'R',
                y: 30
            }, {
                name: 'PG-13',
                y: 5,
                selected: true
            }, {
                name: 'G',
                y: 5,
                sliced: true,
                selected: true
            }]
        }]
    });
    

    Using a for loop to put the top 20 lowest profitable movie name and profit in html code which will be pasted in the cell below

    Blueprint: What Movie is the most Successful.¶

    Drama_DataFrameJust because a movie has made the most profit dosen't mean it is more profitable than a movie that made less profit and or revenue. These were the five main factors that were used to better understand what movie was the most successful. The cost, the revenue, the profit, the gross profit margin percentage which is how much of the revenue is each movie walking away in percentageand the profit percenatge by cost which is comparing the profit with the cost in percentage.

    This is the blueprint for creating the elevnth, twelith, thirthith and fourthith visualization. Highcharts will be used to create these graphs.

    Blueprint:

    • The factors are split into half when it comes to what type of graph that will be used to analyze the data. The profit, revenue and cost varaiables will be analyzed using highcharts Combination Chart. This chart allows different combinations of senories to be combined in a single chart. This chart is a set of column series overlaid by a line and a pie chart. The line in Highchart was designed to illustrate the column averages while the pie is illustrating the column totals. However in this project the pie sereis will not be used and there will be two column series overlaid by a line series. The two columns will be revenue and profit, the line series will be the cost of all the movies in the blueprint. Like every other graph that used Highchart visualization, they all have two different types of code, HTML and javascript. The HTML section is very similar to all the previous graphs expect two main factors, the HTML code that allows the combination graph to be displayed is;

    The other factors are the two different divs id for the combination chart; <div id="x1" display; inline-block>

    <div id="no1" display; inline-block>
    These are the names of the two combination chart that analyze ten movies each of the Top 20 Most Profitable Movies.The javascript section for these graphs consist of four main subsection. The categories, the two types in the series section that are columns and the type in the series section that is spline. The categories subsection is a list of all the names of the movies that will be analyzed. The two types in the in the series subsection that are columns are the profit and revenue of all the movies and the third type in the spline series subsection is the cost of all the movies.

                series:[{
                   type:'column',
                   color:"#111E6C",
                   name:'Profit',
                   data:                                  
                   [2351345279,2303109231,1148258224, 1135772799,1015392272,984846267,912044677,894039076,
          890069413,878000000]
      },{
          type:'column'
          color:Highcharts.getOptions().colors[1],
          name:'Revenue',
          data:[2776345279,2366000000,1348258224,
          1305772799,1215392272,1234846267,1027044677,1104039076,1140069413,1078000000]
     },{
          type:'spline',
          color:'gold',
          name:'Cost',
          data:[425000000,62890769,200000000,170000000,200000000,250000000,115000000,210000000,
                    250000000,200000000],
    
    
    

    The other factors are the two different divs id for the column chart;

            <div id="n1" display; inline-block></div>
            <div id="xo1" display; inline-block></div>
    
    

    The Gross Profit Margin Percentage and the Profit Percentage by Cost will all be analyzed using highcharts combination chart but with only column. The two column will be The Percentage of Gross Profit Margin and Profit by Cost of each movie.

    The javascript section for the columns chart consist of the category subsection which is the names of all the movies.

      categories:['Zootopia| 11th Highest',
      'Finding Nemo| 12th Highest',
      'The Jungle Book| 13th Highest',
      'The Lord of the Rings: The Fellowship of the Ring| 14th Highest',
      'Ice Age: Dawn of the Dinosaurs| 15th Highest',
      'Star Wars Ep. III: Revenge of the Sith| 16th Highest',
      'The Hobbit: The Battle of the Five Armies| 17th Highest',
      'The Twilight Saga: Breaking Dawn, Part 2| 18th Highest',
      'Inside Out| 19th Highest',
      'Deadpool 2| 20th Highest']
    
    

    It also consist of the series subsection that has two factors, the type column which is where the gross profit margin percentage and profit by cost is scripted.

      series:[{
           type:'column',
           color:'#3FE0D0',
           name:'Gross Profit Margin Percentage',
           data:[85.0, 97.0, 85.0, 87.0, 84.0, 80.0, 89.0, 81.0, 78.0, 81.0]
       },{
           type:'column',
           color:'#008081',
           name:'Profit Percentage by Cost',
           data:[553.0, 3662.0, 574.0, 668.0, 508.0, 394.0, 793.0, 426.0, 356.0, 439.0]
       }]  
    

    This is the 'Drama_DataFrame' dataframe.

    In [739]:
    Drama_DataFrame
    
    Out[739]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 180047784 $180,047,784 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
    1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 142634358 $142,634,358 -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
    2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 693698673 $693,698,673 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
    3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 449948323 $449,948,323 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
    4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 634454789 $634,454,789 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    301 A Dirty Shame September 24, 2004 Drama NC-17 15000000.0 $15,000,000 1339668 $1,339,668 574498.0 $574,498 1914166 $1,914,166 -13085834.0 $-13,085,834 191417 191,417 84.0 5.1 Killer Films Suzanne Shepherd John Waters John Waters
    302 Young Adam April 16, 2004 Drama NC-17 6400000.0 $6,400,000 767373 $767,373 1794447.0 $1,794,447 2561820 $2,561,820 -3838180.0 $-3,838,180 256182 256,182 98.0 6.4 Recorded Picture Company Tilda Swinton David Mackenzie \tDavid Mackenzie
    303 Whore 1991 October 4, 1991 Drama NC-17 50000.0 $50,000 0 $0 0.0 $0 1008404 $1,008,404 958404.0 $958,404 100840 100,840 80.0 5.5 Cheap Date Theresa Russell Ken Russell Deborah Dalton
    304 Ma Mère May 13, 2005 Drama NC-17 3259572.0 $3,259,572 71616 $71,616 950532.0 $950,532 1022148 $1,022,148 -2237424.0 $-2,237,424 102215 102,215 110.0 5.0 Gemini Films Louis Garrel Christophe Honoré Christophe Honoré
    305 Law of Desire April 3, 1987 Drama NC-17 612072.0 $612,072 0 $0 0.0 $0 1470809 $1,470,809 858737.0 $858,737 147081 147,081 82.0 7.1 El Deseo Antonio Banderas Pedro Almodóvar Pedro Almodóvar

    306 rows × 22 columns

    Creating a list of all the 'Movies Profits' from 'Drama_DataFrame' dataframe.

    In [737]:
    profit_all = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        profit_all.append(x)
    

    Creating a list of all the 'Movies Names' from 'Drama_DataFrame' dataframe.

    In [738]:
    name_all = []
    for i,x in enumerate(Drama_DataFrame.Movie):
        name_all.append(x)
    

    Creating a list of all the 'Movies Revenue' from 'Drama_DataFrame' dataframe.

    In [740]:
    rev_all = []
    for i,x in enumerate(Drama_DataFrame.Worldwide_Gross):
        rev_all.append(x)
    

    Creating a list of all the 'Movies Budget' from 'Drama_DataFrame' dataframe.

    In [741]:
    bud_all = []
    for i,x in enumerate(Drama_DataFrame.Production_Budget):
        bud_all.append(x)
    

    Creating a list of all the 'Movies Return On Investment Percetage' from 'Drama_DataFrame' dataframe.

    In [744]:
    rio_per_all = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        j = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
        rio_per_all.append(int(round(j,0)))
    

    Creating a list of all the 'Movies Net Profit Margin' from 'Drama_DataFrame' dataframe.

    In [748]:
    npm_all = []
    for i,x in enumerate(Drama_DataFrame.Profit):
        j = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100
        npm_all.append(int(round(j,0)))
    

    Creating tuples consisting of the 'Movies Names, Profits, Revenue, Budget, ROI, NPM' put together which will then be in a list

    In [750]:
    all_all = []
    for i,x in enumerate(profit_all):all_all.append((name_all[i],x,rev_all[i],bud_all[i],rio_per_all[i],npm_all[i]))
    

    After creating the 'all_to' list, the list will be sorted by the 'Profit' of each movie going in decending order. And prinitng out the 'Top 20 Highest Profitable Movies'

    In [753]:
    all_all.sort(key=lambda i:i[1],reverse=True)
    print(all_all[:20])
    
    [('The Lion King 1994', 941214868.0, 986214868, 45000000.0, 2092, 95), ('Gravity', 583698673.0, 693698673, 110000000.0, 531, 84), ('Sing', 559454789.0, 634454789, 75000000.0, 746, 88), ('Tex', 544368315.0, 549368315, 5000000.0, 10887, 99), ('Fifty Shades of Grey', 530998101.0, 570998101, 40000000.0, 1327, 93), ('Cinderella', 447351353.0, 542351353, 95000000.0, 471, 82), ('Beauty and the Beast 1991', 418656843.0, 438656843, 20000000.0, 2093, 95), ('Django Unchained', 349948323.0, 449948323, 100000000.0, 350, 78), ('Fifty Shades Darker', 326398492.0, 381398492, 55000000.0, 593, 86), ('Black Swan', 318266710.0, 331266710, 13000000.0, 2448, 96), ('A Quiet Place', 317522294.0, 334522294, 17000000.0, 1868, 95), ('Fifty Shades Freed', 316350619.0, 371350619, 55000000.0, 575, 85), ('Gone Girl', 307567189.0, 368567189, 61000000.0, 504, 83), ('The Secret Garden', 293281000.0, 311281000, 18000000.0, 1629, 94), ('Wonder', 285937718.0, 305937718, 20000000.0, 1430, 93), ('Wonder', 284604712.0, 304604712, 20000000.0, 1423, 93), ('The Sound of Music', 278014195.0, 286214195, 8200000.0, 3390, 97), ('Bambi 1942', 267142000.0, 268000000, 858000.0, 31135, 100), ('The Hunchback of Notre Drame', 255500000.0, 325500000, 70000000.0, 365, 78), ('True Grit', 217276928.0, 252276928, 35000000.0, 621, 86)]
    

    Getting the 'Names' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [757]:
    top_name = []
    for x,i in enumerate(all_all[:20]):
        if x == 0 : top_name.append(i[0]+' | '+str(int(x+1))+'st Highest')
        elif x == 1 : top_name.append(i[0]+' | '+str(int(x+1))+'nd Highest')
        elif x == 2 : top_name.append(i[0]+' | '+str(int(x+1))+'rd Highest')
        else: top_name.append(i[0]+' | '+str(int(x+1))+'th Highest')
    print(top_name)
    
    ['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest', 'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest', 'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest', 'A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest', 'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest', 'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest']
    

    Getting the 'Profit' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [758]:
    top_profit = []
    for i in all_all[:20]:top_profit.append(i[1])
    print(top_profit)
    
    [941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 418656843.0, 349948323.0, 326398492.0, 318266710.0, 317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
    

    Getting the 'Revenue' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [759]:
    top_rev = []
    for i in all_all[:20]:top_rev.append(i[2])
    print(top_rev)
    
    [986214868, 693698673, 634454789, 549368315, 570998101, 542351353, 438656843, 449948323, 381398492, 331266710, 334522294, 371350619, 368567189, 311281000, 305937718, 304604712, 286214195, 268000000, 325500000, 252276928]
    

    Getting the 'Budget' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [760]:
    top_bud = []
    for i in all_all[:20]:top_bud.append(i[3])
    print(top_bud)
    
    [45000000.0, 110000000.0, 75000000.0, 5000000.0, 40000000.0, 95000000.0, 20000000.0, 100000000.0, 55000000.0, 13000000.0, 17000000.0, 55000000.0, 61000000.0, 18000000.0, 20000000.0, 20000000.0, 8200000.0, 858000.0, 70000000.0, 35000000.0]
    

    Getting the 'ROI' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [761]:
    top_roi = []
    for i in all_all[:20]:top_roi.append(i[4])
    print(top_roi)
    
    [2092, 531, 746, 10887, 1327, 471, 2093, 350, 593, 2448, 1868, 575, 504, 1629, 1430, 1423, 3390, 31135, 365, 621]
    

    Getting the 'NPM' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.

    In [762]:
    top_npm = []
    for i in all_all[:20]:top_npm.append(i[5])
    print(top_npm)
    
    [95, 84, 88, 99, 93, 82, 95, 78, 86, 96, 95, 85, 83, 94, 93, 93, 97, 100, 78, 86]
    

    This is the HTML Script from Highcharts Libaray to visualize the Revenue, Profit, Cost, Return On Investment and Net Profit Margin in the 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, to see which movie is the most 'Successful' using a 'Colunm Series and Line Chart infused'. This will be done using Javascript and HTML below.

    In [22]:
    %%html
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    <figure class="highcharts-figure">
        <div id="-" style='width:1000' ></div>
        <div id="-"style='width:1000' ></div>
    </figure>
    
    In [23]:
    %%js
    Highcharts.chart('x',{
        chart: {
            width: 900,
            height: 350
        },
        title:{
            text:"What Movie Is The Most Successful1?"
        },
        xAxis:{
            categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest', 
                        'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest', 
                        'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
            crosshair:{
                enabled:true
            },
            labels:{
                enabled:false
            }
        },
        yAxis:{
            min:0,
            max:1000000000,
            step:250000000,
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        plotOptions:{
           series:{
               marker:{
                   states:{
                       hover:{
                           radiusPlus:12,
                           lineWidthPlus:5
                       }
                   }
               }
           } 
        },
        tooltip:{
            shared:false
        },
        states:{
            hover:{
                lineWidthPlus:10
            }
        },
        series:[{
            type:'column',
            color:'#C21602',
            name:'Profit',
            data:[941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 
                  418656843.0, 349948323.0, 326398492.0, 318266710.0]
        },{
            type:'column',
            color:'#F88379',
            name:'Revenue',
            data:[986214868, 693698673, 634454789, 549368315, 570998101, 542351353, 
                  438656843, 449948323, 381398492, 331266710]
        },{
            type:'spline',
            color:'gold',
            name:'Cost',
            data:[45000000.0, 110000000.0, 75000000.0, 5000000.0, 40000000.0, 
                  95000000.0, 20000000.0, 100000000.0, 55000000.0, 13000000.0],
            marker:{
                lineWidth: 2,
                lineColor: 'gold',
                fillColor: 'white',
                raduis:2
            }
       }]
    });
    
    In [31]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    Highcharts.chart('-',{
            chart: {
            width: 900,
            height: 300
        },
            title:{
                text:""
            },
            xAxis:{
               categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest', 
                        'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest', 
                        'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
               crosshair:{
                   enabled:true
               },
               labels:{
                   enabled:true
               } 
            },
            yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            },
            
      },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                            valueSuffix:'%',
                                    }
                    },
    				series: {
    					dataLabels: {
    						enabled: true,
                            valueSuffix:'%',
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
           tooltip:{
               valueSuffix:'%',
               shared:true
           },
           series:[{
               type:'column',
               color:'#F57070',
               name:'Net Profit Margin',
               data:[95, 84, 88, 99, 93, 82, 95, 78, 86, 96]
           },{
               type:'column',
               color:'#EC0303',
               name:'Return On Investment Percentage',
               data:[2092, 531, 746, 10887, 1327, 471, 2093, 350, 593, 2448]
           }]  
        
       });
    

    This is the HTML Script from Highcharts Libaray to visualize the Revenue, Profit, Cost, Return On Investment and Net Profit Margin in the 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, to see which movie is the most 'Successful' using a 'Colunm Series and Line Chart infused'. This will be done using Javascript and HTML below.

    In [41]:
    %%HTML
    <script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
    <script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
    <script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
    <figure class="highcharts-figure">
        <div id="-" style='width:1000' ></div>
        <div id="-" style='width:1000' ></div>
    </figure>
    
    In [28]:
    %%js
    Highcharts.chart('-',{
        chart: {
            width: 900,
            height: 350
        },
        title:{
            text:"What Movie Is The Most Successful?"
        },
        xAxis:{
            categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest', 
                        'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest', 
                        'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
            crosshair:{
                enabled:true
            },
            labels:{
                enabled:false
            }
        },
        yAxis:{
            min:0,
            max:400000000,
            step:250000000,
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        plotOptions:{
           series:{
               marker:{
                   states:{
                       hover:{
                           radiusPlus:12,
                           lineWidthPlus:5
                       }
                   }
               }
           } 
        },
        tooltip:{
            shared:false
        },
        states:{
            hover:{
                lineWidthPlus:10
            }
        },
        series:[{
            type:'column',
            color:'#C21602',
            name:'Profit',
            data:[317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
        },{
            type:'column',
            color:'#F88379',
            name:'Revenue',
            data:[334522294, 371350619, 368567189, 311281000, 305937718, 304604712, 286214195, 268000000, 325500000, 252276928]
        },{
            type:'spline',
            color:'gold',
            name:'Cost',
            data:[17000000.0, 55000000.0, 61000000.0, 18000000.0, 20000000.0, 20000000.0, 8200000.0, 858000.0, 70000000.0, 35000000.0],
            marker:{
                lineWidth: 2,
                lineColor: 'gold',
                fillColor: 'white',
                raduis:2
            }
       }]
    });
    
    In [29]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    Highcharts.chart('-',{
            chart: {
            width: 900,
            height: 310
        },
            title:{
                text:""
            },
            xAxis:{
               categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest', 
                        'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest', 
                        'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
               crosshair:{
                   enabled:true
               },
               labels:{
                   enabled:true
               } 
            },
           yAxis:{
            type: 'logarithmic',
           },
          legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
           plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
           tooltip:{
               valueSuffix:'%',
               shared:true
           },
           series:[{
               type:'column',
               color:'#F57070',
               name:'Net Profit Margin',
               data:[95, 85, 83, 94, 93, 93, 97, 100, 78, 86]
           },{
               type:'column',
               color:'#EC0303',
               name:'Return On Investment Percentage',
               data:[1868, 575, 504, 1629, 1430, 1423, 3390, 31135, 365, 621]
           }]  
        
       });
    

    Analysis

    Blueprint: Number of Tickets Sold.¶

    This analysis objective is to axknowledge what system rating that best suit this genre, this allows the ideology of what kind of audience is drawn to this genre, to specify the audience and make them the target focus. This is the blueprint for creating the last visualization of this project, Highcharts will be used to create this graph.

    Blueprint:

    • The graph used for this visualization is a Highchart Donut chart, it is basically a hallow pie chart which is commonly referred to as a donut charts. This pie charts also ahs an inner chart resulting in a hierachical type of visualization.

    • The first approach th this chart is the HTML section whcih is very simple, it has the div id which is where the graph is named and the style and height should be chosen.

    • The second approach is the javascript code, it is divided into two sections the inner pie and the outer pie. The inner pie shows how much each sytem ratings occupies the most total tickets sold in percentage comparied to the total tickest sold in the whole entire parent dataframe 'all_drama_info1'. The outer pie shows the individule movies in the category the amount of tickets they sold.

      • Javascript Section:

        -First Section: The first section is a list called data it has consist of the name of the system ratings and the percenatge of tickets sold compared to the total of tickets sold. The color and the sliced adjustments and it also has the adjustments of the size of the ring pie. data:[{ name:'System Rating: R', y:28.0, sliced:true, color: '#4682B4',}

        -Second Section: The second section is a list also called data it requires the name of all the movies, the amount of tickest they sold , the option to slice that section and the color of the slice .
        data:[{ name: ' Avatar ' , y: 213565021 , sliced:true, selected: true, color:"#4682B4",}

    This is the 'Drama_DataFrame' dataframe.

    In [799]:
    Drama_DataFrame
    
    Out[799]:
    Movie Release_Date Genre Rating Production_Budget Production_Budget_x Domestic_Gross Domestic_Gross_x Foreign_Gross Foreign_Gross_x Worldwide_Gross Worldwide_Gross_x Profit Profit_x Tickets Tickets_x Runtime Averagerating Company Star Director Writer
    0 Hugo Nov 23, 2011 Drama PG 180000000.0 $180,000,000 73864507 $73,864,507 111900000.0 $111,900,000 180047784 $180,047,784 47784.0 $47,784 18004778 18,004,778 126.0 7.5 Paramount Pictures Asa Butterfield Martin Scorsese John Logan
    1 The Wolfman Feb 12, 2010 Drama R 150000000.0 $150,000,000 62189884 $62,189,884 77800000.0 $77,800,000 142634358 $142,634,358 -7365642.0 $-7,365,642 14263436 14,263,436 NaN 5.8 NaN Benicio Del Toro Joe Johnston Andrew Kevin Walker
    2 Gravity Oct 4, 2013 Drama PG-13 110000000.0 $110,000,000 274092705 $274,092,705 449100000.0 $449,100,000 693698673 $693,698,673 583698673.0 $583,698,673 69369867 69,369,867 91.0 7.7 Warner Bros. Sandra Bullock Alfonso Cuarón Alfonso Cuarón
    3 Django Unchained Dec 25, 2012 Drama R 100000000.0 $100,000,000 162805434 $162,805,434 262600000.0 $262,600,000 449948323 $449,948,323 349948323.0 $349,948,323 44994832 44,994,832 165.0 8.4 The Weinstein Company Jamie Foxx Quentin Tarantino Quentin Tarantino
    4 Sing Dec 21, 2016 Drama PG-13 75000000.0 $75,000,000 270329045 $270,329,045 363800000.0 $363,800,000 634454789 $634,454,789 559454789.0 $559,454,789 63445479 63,445,479 98.0 7.1 TriStar Pictures Lorraine Bracco Richard Baskin Dean Pitchford
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    301 A Dirty Shame September 24, 2004 Drama NC-17 15000000.0 $15,000,000 1339668 $1,339,668 574498.0 $574,498 1914166 $1,914,166 -13085834.0 $-13,085,834 191417 191,417 84.0 5.1 Killer Films Suzanne Shepherd John Waters John Waters
    302 Young Adam April 16, 2004 Drama NC-17 6400000.0 $6,400,000 767373 $767,373 1794447.0 $1,794,447 2561820 $2,561,820 -3838180.0 $-3,838,180 256182 256,182 98.0 6.4 Recorded Picture Company Tilda Swinton David Mackenzie \tDavid Mackenzie
    303 Whore 1991 October 4, 1991 Drama NC-17 50000.0 $50,000 0 $0 0.0 $0 1008404 $1,008,404 958404.0 $958,404 100840 100,840 80.0 5.5 Cheap Date Theresa Russell Ken Russell Deborah Dalton
    304 Ma Mère May 13, 2005 Drama NC-17 3259572.0 $3,259,572 71616 $71,616 950532.0 $950,532 1022148 $1,022,148 -2237424.0 $-2,237,424 102215 102,215 110.0 5.0 Gemini Films Louis Garrel Christophe Honoré Christophe Honoré
    305 Law of Desire April 3, 1987 Drama NC-17 612072.0 $612,072 0 $0 0.0 $0 1470809 $1,470,809 858737.0 $858,737 147081 147,081 82.0 7.1 El Deseo Antonio Banderas Pedro Almodóvar Pedro Almodóvar

    306 rows × 22 columns

    Getting the number of Tickets sold in the R-rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.

    In [630]:
    var = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'R':
            var.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
    print(var)
    
    [('The Wolfman', 14263436), ('Django Unchained', 44994832), ('Downsizing', 5446297), ('Gone Girl', 36856719), ('Priest', 8415403), ('Fifty Shades Darker', 38139849), ('Fifty Shades Freed', 37135062), ('Crimson Peak', 7496685), ('Zero Dark Thirty', 13461244), ('Fifty Shades of Grey', 57099810), ('The Master', 5064742), ('Biutiful', 2468752), ('Flight', 16055844), ('Tulip Fever', 679277), ('The Ides of March', 7773592), ('Nocturnal Animals', 3239868), ('The Water Diviner', 3105473), ('Stone', 406502), ('For Colored Girls', 3801787), ('The Debt', 4660405), ('Let Me In', 2827040), ('By the Sea', 372775), ('Miss Sloane', 771963), ('The Homesman', 821757), ('The Immigrant', 758501), ('Never Let Me Go', 1117372), ('The Reluctant Fundamentalist', 52873), ('Black Swan', 33126671), ('Ex Machina', 3835839), ('Room', 3626278), ('Chloe', 1183113), ('If Beale Street Could Talk', 1985917), ('Arbitrage', 3583071), ('Stoker', 1203491), ('Carol', 4284352), ('Quartet', 5617894), ('Hereditary', 7013390), ('Coriolanus', 217962), ('Melancholia', 2181730), ('Manchester by the Sea', 7773387), ('We Need to Talk About Kevin', 1076528), ('Hesher', 38295), ('Addicted', 1749924), ('Everything Must Go', 282101), ('Mommy', 1753600), ('Take Shelter', 497202), ('Boyhood', 5727305), ('Stake Land', 67948), ('The Witch', 4045452), ('Margin Call', 2043323), ('Whiplash', 3896904), ('Before Midnight', 2325193), ('Silent House', 1661076), ("Winter's Bone", 1613155), ('The Florida Project', 1129532), ('We Are Your Friends', 1015342), ('Locke', 208839), ('Knock Knock', 632852), ('Buried', 2127029), ('Unsane', 1424493), ('Blue Valentine', 1656624), ('Martha Marcy May Marlene', 543891), ('Palo Alto', 115631), ('I Origins', 85240), ('The Canyons', 6238), ('Sound of My Voice', 42945), ('A Ghost Story', 276978), ('Ordinary People', 5476692), ('Fame', 7721184), ('Endless Love', 3471817), ('Ghost Story', 195168), ('One from the Heart', 63680), ('The Hand', 244758), ('Pennies from Heaven', 917129), ('Zoot Suit', 325608), ('Rich and Famous', 1300000), ('Raggedy Man', 1100000)]
    

    After creating the 'var' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.

    In [632]:
    var.sort(key=lambda i:i[1],reverse=True)
    print(var)
    
    [('Fifty Shades of Grey', 57099810), ('Django Unchained', 44994832), ('Fifty Shades Darker', 38139849), ('Fifty Shades Freed', 37135062), ('Gone Girl', 36856719), ('Black Swan', 33126671), ('Flight', 16055844), ('The Wolfman', 14263436), ('Zero Dark Thirty', 13461244), ('Priest', 8415403), ('The Ides of March', 7773592), ('Manchester by the Sea', 7773387), ('Fame', 7721184), ('Crimson Peak', 7496685), ('Hereditary', 7013390), ('Boyhood', 5727305), ('Quartet', 5617894), ('Ordinary People', 5476692), ('Downsizing', 5446297), ('The Master', 5064742), ('The Debt', 4660405), ('Carol', 4284352), ('The Witch', 4045452), ('Whiplash', 3896904), ('Ex Machina', 3835839), ('For Colored Girls', 3801787), ('Room', 3626278), ('Arbitrage', 3583071), ('Endless Love', 3471817), ('Nocturnal Animals', 3239868), ('The Water Diviner', 3105473), ('Let Me In', 2827040), ('Biutiful', 2468752), ('Before Midnight', 2325193), ('Melancholia', 2181730), ('Buried', 2127029), ('Margin Call', 2043323), ('If Beale Street Could Talk', 1985917), ('Mommy', 1753600), ('Addicted', 1749924), ('Silent House', 1661076), ('Blue Valentine', 1656624), ("Winter's Bone", 1613155), ('Unsane', 1424493), ('Rich and Famous', 1300000), ('Stoker', 1203491), ('Chloe', 1183113), ('The Florida Project', 1129532), ('Never Let Me Go', 1117372), ('Raggedy Man', 1100000), ('We Need to Talk About Kevin', 1076528), ('We Are Your Friends', 1015342), ('Pennies from Heaven', 917129), ('The Homesman', 821757), ('Miss Sloane', 771963), ('The Immigrant', 758501), ('Tulip Fever', 679277), ('Knock Knock', 632852), ('Martha Marcy May Marlene', 543891), ('Take Shelter', 497202), ('Stone', 406502), ('By the Sea', 372775), ('Zoot Suit', 325608), ('Everything Must Go', 282101), ('A Ghost Story', 276978), ('The Hand', 244758), ('Coriolanus', 217962), ('Locke', 208839), ('Ghost Story', 195168), ('Palo Alto', 115631), ('I Origins', 85240), ('Stake Land', 67948), ('One from the Heart', 63680), ('The Reluctant Fundamentalist', 52873), ('Sound of My Voice', 42945), ('Hesher', 38295), ('The Canyons', 6238)]
    
    In [725]:
    all_to = []
    for i in var:all_to.append(i[1])
    print(sum(all_to))
    
    449780631
    

    Using a for loop to put the Name and the Number of Tickets sold in the R-rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.

    In [772]:
    for i,x in enumerate(range(len(var))):
        print('         },{ \n           name:',"'",var[i][0],"'",','+'\n           y:',var[i][1]/449780631,','+'\n           color:"#581845",')
    
             },{ 
               name: ' Fifty Shades of Grey ' ,
               y: 0.126950353271215 ,
               color:"#581845",
             },{ 
               name: ' Django Unchained ' ,
               y: 0.1000372823968936 ,
               color:"#581845",
             },{ 
               name: ' Fifty Shades Darker ' ,
               y: 0.08479655719100986 ,
               color:"#581845",
             },{ 
               name: ' Fifty Shades Freed ' ,
               y: 0.08256260817064842 ,
               color:"#581845",
             },{ 
               name: ' Gone Girl ' ,
               y: 0.08194376649358207 ,
               color:"#581845",
             },{ 
               name: ' Black Swan ' ,
               y: 0.07365072819242854 ,
               color:"#581845",
             },{ 
               name: ' Flight ' ,
               y: 0.035697055171768834 ,
               color:"#581845",
             },{ 
               name: ' The Wolfman ' ,
               y: 0.03171198361362964 ,
               color:"#581845",
             },{ 
               name: ' Zero Dark Thirty ' ,
               y: 0.02992846528333498 ,
               color:"#581845",
             },{ 
               name: ' Priest ' ,
               y: 0.018710016439102733 ,
               color:"#581845",
             },{ 
               name: ' The Ides of March ' ,
               y: 0.017283074157099485 ,
               color:"#581845",
             },{ 
               name: ' Manchester by the Sea ' ,
               y: 0.01728261837935836 ,
               color:"#581845",
             },{ 
               name: ' Fame ' ,
               y: 0.0171665551334068 ,
               color:"#581845",
             },{ 
               name: ' Crimson Peak ' ,
               y: 0.016667425147527084 ,
               color:"#581845",
             },{ 
               name: ' Hereditary ' ,
               y: 0.015592912448024023 ,
               color:"#581845",
             },{ 
               name: ' Boyhood ' ,
               y: 0.01273355188120584 ,
               color:"#581845",
             },{ 
               name: ' Quartet ' ,
               y: 0.012490297742501055 ,
               color:"#581845",
             },{ 
               name: ' Ordinary People ' ,
               y: 0.012176362481024666 ,
               color:"#581845",
             },{ 
               name: ' Downsizing ' ,
               y: 0.012108785093504838 ,
               color:"#581845",
             },{ 
               name: ' The Master ' ,
               y: 0.011260471551964184 ,
               color:"#581845",
             },{ 
               name: ' The Debt ' ,
               y: 0.010361506651894932 ,
               color:"#581845",
             },{ 
               name: ' Carol ' ,
               y: 0.009525425740264925 ,
               color:"#581845",
             },{ 
               name: ' The Witch ' ,
               y: 0.008994277923897528 ,
               color:"#581845",
             },{ 
               name: ' Whiplash ' ,
               y: 0.0086640102561464 ,
               color:"#581845",
             },{ 
               name: ' Ex Machina ' ,
               y: 0.008528244071941818 ,
               color:"#581845",
             },{ 
               name: ' For Colored Girls ' ,
               y: 0.008452536054181487 ,
               color:"#581845",
             },{ 
               name: ' Room ' ,
               y: 0.008062325831900929 ,
               color:"#581845",
             },{ 
               name: ' Arbitrage ' ,
               y: 0.00796626344721367 ,
               color:"#581845",
             },{ 
               name: ' Endless Love ' ,
               y: 0.007718911755450848 ,
               color:"#581845",
             },{ 
               name: ' Nocturnal Animals ' ,
               y: 0.007203218139466748 ,
               color:"#581845",
             },{ 
               name: ' The Water Diviner ' ,
               y: 0.006904416922301841 ,
               color:"#581845",
             },{ 
               name: ' Let Me In ' ,
               y: 0.006285375147690608 ,
               color:"#581845",
             },{ 
               name: ' Biutiful ' ,
               y: 0.005488791268114878 ,
               color:"#581845",
             },{ 
               name: ' Before Midnight ' ,
               y: 0.005169615674268552 ,
               color:"#581845",
             },{ 
               name: ' Melancholia ' ,
               y: 0.004850653517803438 ,
               color:"#581845",
             },{ 
               name: ' Buried ' ,
               y: 0.00472903645332829 ,
               color:"#581845",
             },{ 
               name: ' Margin Call ' ,
               y: 0.004542932396748761 ,
               color:"#581845",
             },{ 
               name: ' If Beale Street Could Talk ' ,
               y: 0.004415301289396786 ,
               color:"#581845",
             },{ 
               name: ' Mommy ' ,
               y: 0.0038987894967847116 ,
               color:"#581845",
             },{ 
               name: ' Addicted ' ,
               y: 0.00389061662372918 ,
               color:"#581845",
             },{ 
               name: ' Silent House ' ,
               y: 0.0036930803274185455 ,
               color:"#581845",
             },{ 
               name: ' Blue Valentine ' ,
               y: 0.0036831821688648927 ,
               color:"#581845",
             },{ 
               name: ' Winter's Bone ' ,
               y: 0.0035865372779914128 ,
               color:"#581845",
             },{ 
               name: ' Unsane ' ,
               y: 0.0031670839111789142 ,
               color:"#581845",
             },{ 
               name: ' Rich and Famous ' ,
               y: 0.0028902978705634837 ,
               color:"#581845",
             },{ 
               name: ' Stoker ' ,
               y: 0.0026757288265710135 ,
               color:"#581845",
             },{ 
               name: ' Chloe ' ,
               y: 0.0026304222957969038 ,
               color:"#581845",
             },{ 
               name: ' The Florida Project ' ,
               y: 0.0025112953341025483 ,
               color:"#581845",
             },{ 
               name: ' Never Let Me Go ' ,
               y: 0.0024842599324825083 ,
               color:"#581845",
             },{ 
               name: ' Raggedy Man ' ,
               y: 0.002445636659707563 ,
               color:"#581845",
             },{ 
               name: ' We Need to Talk About Kevin ' ,
               y: 0.0023934512200015122 ,
               color:"#581845",
             },{ 
               name: ' We Are Your Friends ' ,
               y: 0.0022574160157643607 ,
               color:"#581845",
             },{ 
               name: ' Pennies from Heaven ' ,
               y: 0.002039058458255398 ,
               color:"#581845",
             },{ 
               name: ' The Homesman ' ,
               y: 0.0018270173132466435 ,
               color:"#581845",
             },{ 
               name: ' Miss Sloane ' ,
               y: 0.0017163100115798451 ,
               color:"#581845",
             },{ 
               name: ' The Immigrant ' ,
               y: 0.001686379865477133 ,
               color:"#581845",
             },{ 
               name: ' Tulip Fever ' ,
               y: 0.0015102406666328858 ,
               color:"#581845",
             },{ 
               name: ' Knock Knock ' ,
               y: 0.0014070236830629552 ,
               color:"#581845",
             },{ 
               name: ' Martha Marcy May Marlene ' ,
               y: 0.0012092361531681874 ,
               color:"#581845",
             },{ 
               name: ' Take Shelter ' ,
               y: 0.0011054322167999271 ,
               color:"#581845",
             },{ 
               name: ' Stone ' ,
               y: 0.000903778357676767 ,
               color:"#581845",
             },{ 
               name: ' By the Sea ' ,
               y: 0.0008287929143840789 ,
               color:"#581845",
             },{ 
               name: ' Zoot Suit ' ,
               y: 0.000723926237721873 ,
               color:"#581845",
             },{ 
               name: ' Everything Must Go ' ,
               y: 0.0006271968612183303 ,
               color:"#581845",
             },{ 
               name: ' A Ghost Story ' ,
               y: 0.0006158068643022558 ,
               color:"#581845",
             },{ 
               name: ' The Hand ' ,
               y: 0.000544171943233367 ,
               color:"#581845",
             },{ 
               name: ' Coriolanus ' ,
               y: 0.0004845962342028908 ,
               color:"#581845",
             },{ 
               name: ' Locke ' ,
               y: 0.000464313013069698 ,
               color:"#581845",
             },{ 
               name: ' Ghost Story ' ,
               y: 0.0004339181960016415 ,
               color:"#581845",
             },{ 
               name: ' Palo Alto ' ,
               y: 0.00025708310236240476 ,
               color:"#581845",
             },{ 
               name: ' I Origins ' ,
               y: 0.00018951460806679334 ,
               color:"#581845",
             },{ 
               name: ' Stake Land ' ,
               y: 0.00015106919977619045 ,
               color:"#581845",
             },{ 
               name: ' One from the Heart ' ,
               y: 0.0001415801295365251 ,
               color:"#581845",
             },{ 
               name: ' The Reluctant Fundamentalist ' ,
               y: 0.00011755286100792544 ,
               color:"#581845",
             },{ 
               name: ' Sound of My Voice ' ,
               y: 9.547987850103754e-05 ,
               color:"#581845",
             },{ 
               name: ' Hesher ' ,
               y: 8.514150534863738e-05 ,
               color:"#581845",
             },{ 
               name: ' The Canyons ' ,
               y: 1.3868983166596162e-05 ,
               color:"#581845",
    

    Getting the number of Tickets sold in the NC-17 rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.

    In [753]:
    var1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'NC-17':
            var1.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
    print(var1)
    
    [('Shame', 2041284), ('Matador', 1735627), ('Whore', 100840), ('Tokyo Decadence', 27784), ('Wide Sargasso Sea', 161478), ('Kids', 2041222), ('Showgirls', 2035075), ('Crash', 9841006), ('Bent', 49606), ('The Dreamers', 1512116), ('Ma mère', 102215), ('Lust, Caution', 6709192), ('Shame', 2041284), ('Blue Is the Warmest Colour', 1946584), ('Showgirls', 3775075), ('The Dreamers', 1530711), ('Shame', 2041284), ('Blue Is the Warmest Colour', 1946584), ('Blue Valentine', 1656624), ('Two Girls and a Guy', 231503), ('Elles', 382224), ('Hell', 21312000), ('Killer Joe', 465911), ('Se, jie', 6516743), ('Queen of Hearts', 123684), ('The Evil Dead', 266194), ('Man Bites Dog', 20557), ('Shame', 2041284), ('Nymphomaniac: Vol. I', 209430), ('Arabian Nights', 345342), ('Frontier(s)', 278354), ('Chained', 10309), ('Natural Born Killers', 5028356), ('Clerks', 389424), ('Bad Lieutenant', 203892), ('The Big Feast', 69087), ('Beyond the Valley of the Dolls', 900000), ('Kids', 2041222), ('Crash', 10117304), ('Last Tango in Paris', 3614771), ('Pink Flamingos', 41380), ('Lust, Caution ', 6516743), ('Happiness 1998', 574645), ('Orgazmo', 62729), ('A Dirty Shame', 191417), ('Young Adam', 256182), ('Whore 1991', 100840), ('Ma Mère', 102215), ('Law of Desire', 147081)]
    

    After creating the 'var1' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.

    In [754]:
    var1.sort(key=lambda i:i[1],reverse=True)
    print(var1)
    
    [('Hell', 21312000), ('Crash', 10117304), ('Crash', 9841006), ('Lust, Caution', 6709192), ('Se, jie', 6516743), ('Lust, Caution ', 6516743), ('Natural Born Killers', 5028356), ('Showgirls', 3775075), ('Last Tango in Paris', 3614771), ('Shame', 2041284), ('Shame', 2041284), ('Shame', 2041284), ('Shame', 2041284), ('Kids', 2041222), ('Kids', 2041222), ('Showgirls', 2035075), ('Blue Is the Warmest Colour', 1946584), ('Blue Is the Warmest Colour', 1946584), ('Matador', 1735627), ('Blue Valentine', 1656624), ('The Dreamers', 1530711), ('The Dreamers', 1512116), ('Beyond the Valley of the Dolls', 900000), ('Happiness 1998', 574645), ('Killer Joe', 465911), ('Clerks', 389424), ('Elles', 382224), ('Arabian Nights', 345342), ('Frontier(s)', 278354), ('The Evil Dead', 266194), ('Young Adam', 256182), ('Two Girls and a Guy', 231503), ('Nymphomaniac: Vol. I', 209430), ('Bad Lieutenant', 203892), ('A Dirty Shame', 191417), ('Wide Sargasso Sea', 161478), ('Law of Desire', 147081), ('Queen of Hearts', 123684), ('Ma mère', 102215), ('Ma Mère', 102215), ('Whore', 100840), ('Whore 1991', 100840), ('The Big Feast', 69087), ('Orgazmo', 62729), ('Bent', 49606), ('Pink Flamingos', 41380), ('Tokyo Decadence', 27784), ('Man Bites Dog', 20557), ('Chained', 10309)]
    
    In [755]:
    all_to = []
    for i in var1:all_to.append(i[1])
    print(sum(all_to))
    
    103856414
    

    Using a for loop to put the Name and the Number of Tickets sold in the NC-17 rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.

    In [774]:
    for i,x in enumerate(range(len(var1))):
        print('         },{ \n           name:',"'",var1[i][0],"'",','+'\n           y:',var1[i][1]/103856414,',','\n           color:"#FF5733",')
    
             },{ 
               name: ' Hell ' ,
               y: 0.20520639197113044 , 
               color:"#FF5733",
             },{ 
               name: ' Crash ' ,
               y: 0.09741626549901868 , 
               color:"#FF5733",
             },{ 
               name: ' Crash ' ,
               y: 0.09475588094154686 , 
               color:"#FF5733",
             },{ 
               name: ' Lust, Caution ' ,
               y: 0.06460065143400773 , 
               color:"#FF5733",
             },{ 
               name: ' Se, jie ' ,
               y: 0.06274762192347601 , 
               color:"#FF5733",
             },{ 
               name: ' Lust, Caution  ' ,
               y: 0.06274762192347601 , 
               color:"#FF5733",
             },{ 
               name: ' Natural Born Killers ' ,
               y: 0.0484164223116735 , 
               color:"#FF5733",
             },{ 
               name: ' Showgirls ' ,
               y: 0.036348982740728945 , 
               color:"#FF5733",
             },{ 
               name: ' Last Tango in Paris ' ,
               y: 0.034805467094213366 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Kids ' ,
               y: 0.019654269980860305 , 
               color:"#FF5733",
             },{ 
               name: ' Kids ' ,
               y: 0.019654269980860305 , 
               color:"#FF5733",
             },{ 
               name: ' Showgirls ' ,
               y: 0.01959508249533823 , 
               color:"#FF5733",
             },{ 
               name: ' Blue Is the Warmest Colour ' ,
               y: 0.018743031123720486 , 
               color:"#FF5733",
             },{ 
               name: ' Blue Is the Warmest Colour ' ,
               y: 0.018743031123720486 , 
               color:"#FF5733",
             },{ 
               name: ' Matador ' ,
               y: 0.016711794035176298 , 
               color:"#FF5733",
             },{ 
               name: ' Blue Valentine ' ,
               y: 0.015951099563287444 , 
               color:"#FF5733",
             },{ 
               name: ' The Dreamers ' ,
               y: 0.01473872379225418 , 
               color:"#FF5733",
             },{ 
               name: ' The Dreamers ' ,
               y: 0.014559678519229444 , 
               color:"#FF5733",
             },{ 
               name: ' Beyond the Valley of the Dolls ' ,
               y: 0.00866581047175382 , 
               color:"#FF5733",
             },{ 
               name: ' Happiness 1998 ' ,
               y: 0.005533071842823304 , 
               color:"#FF5733",
             },{ 
               name: ' Killer Joe ' ,
               y: 0.0044861071363392156 , 
               color:"#FF5733",
             },{ 
               name: ' Clerks ' ,
               y: 0.003749638419058066 , 
               color:"#FF5733",
             },{ 
               name: ' Elles ' ,
               y: 0.0036803119352840355 , 
               color:"#FF5733",
             },{ 
               name: ' Arabian Nights ' ,
               y: 0.003325187022151564 , 
               color:"#FF5733",
             },{ 
               name: ' Frontier(s) ' ,
               y: 0.0026801811200606253 , 
               color:"#FF5733",
             },{ 
               name: ' The Evil Dead ' ,
               y: 0.0025630963919089293 , 
               color:"#FF5733",
             },{ 
               name: ' Young Adam ' ,
               y: 0.002466694064749819 , 
               color:"#FF5733",
             },{ 
               name: ' Two Girls and a Guy ' ,
               y: 0.002229067912936027 , 
               color:"#FF5733",
             },{ 
               name: ' Nymphomaniac: Vol. I ' ,
               y: 0.002016534096777114 , 
               color:"#FF5733",
             },{ 
               name: ' Bad Lieutenant ' ,
               y: 0.001963210476340922 , 
               color:"#FF5733",
             },{ 
               name: ' A Dirty Shame ' ,
               y: 0.0018430927145241121 , 
               color:"#FF5733",
             },{ 
               name: ' Wide Sargasso Sea ' ,
               y: 0.0015548197148420703 , 
               color:"#FF5733",
             },{ 
               name: ' Law of Desire ' ,
               y: 0.001416195633328915 , 
               color:"#FF5733",
             },{ 
               name: ' Queen of Hearts ' ,
               y: 0.0011909134470982216 , 
               color:"#FF5733",
             },{ 
               name: ' Ma mère ' ,
               y: 0.0009841953526336853 , 
               color:"#FF5733",
             },{ 
               name: ' Ma Mère ' ,
               y: 0.0009841953526336853 , 
               color:"#FF5733",
             },{ 
               name: ' Whore ' ,
               y: 0.0009709559199685058 , 
               color:"#FF5733",
             },{ 
               name: ' Whore 1991 ' ,
               y: 0.0009709559199685058 , 
               color:"#FF5733",
             },{ 
               name: ' The Big Feast ' ,
               y: 0.0006652164978467291 , 
               color:"#FF5733",
             },{ 
               name: ' Orgazmo ' ,
               y: 0.0006039973612029393 , 
               color:"#FF5733",
             },{ 
               name: ' Bent ' ,
               y: 0.00047764021584646666 , 
               color:"#FF5733",
             },{ 
               name: ' Pink Flamingos ' ,
               y: 0.00039843470813463673 , 
               color:"#FF5733",
             },{ 
               name: ' Tokyo Decadence ' ,
               y: 0.0002675231979413424 , 
               color:"#FF5733",
             },{ 
               name: ' Man Bites Dog ' ,
               y: 0.0001979367398531592 , 
               color:"#FF5733",
             },{ 
               name: ' Chained ' ,
               y: 9.926204461478904e-05 , 
               color:"#FF5733",
    

    Getting the number of Tickets sold in the PG-rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.

    In [737]:
    var2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'PG':
            var2.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
    print(var2)
    
    [('Hugo', 18004778), ('Dolphin Tale', 9606872), ('Extraordinary Measures', 1582698), ('Wonder', 30460471), ('The Last Song', 9267895), ('War Room', 7397524), ('The Lunchbox', 1223150), ('Somewhere in Time', 970960), ('Urban Cowboy', 4691829), ('Cinderella', 54235135), ('War Room', 7398690), ('Wonder', 30593772), ('Little Women', 21660121), ('Overcomer', 3810299), ('The Jazz Singer', 2711800), ('Cattle Annie and Little Britches', 53482), ('The Majestic', 3730633), ('A Walk to Remember', 4749492), ('Tuck Everlasting', 1934462), ('Dreamer', 3874173), ('The Lake House', 11483011), ('We Are Marshall', 4354536), ('Akeelah and the Bee', 1894842), ('The Ultimate Gift', 343874), ('Bridge to Terabithia', 13758706), ('August Rush', 6460576), ('Fireproof', 3347330), ('The Last Song', 8913705), ('What If...', 852629), ("God's Not Dead", 6466787), ("Mr. Holland's Opus", 10626997), ('The Indian in the Cupboard', 3565613), ('Fluke', 398777), ('Three Wishes', 702550), ('Phenomenon', 15203638), ('Contact', 17112033), ('The Spanish Prisoner', 1383513), ('Music of the Heart', 1485939), ('Sense and Sensibility', 13458278), ('The Secret of Roan Inish', 610182), ('The Remains of the Day', 6395497), ('Gettysburg', 1076996), ('The Age of Innocence', 3225544), ('Pure Country', 1516446), ('Forever Young', 12795619), ('Newsies', 281948), ('A River Runs Through It', 4344029), ('Honeysuckle Rose', 1781521), ('Resurrection', 15729752), ('Taps', 3585605), ('On Golden Pond', 11928543), ('Absence of Malice', 4071696), ('Ragtime', 1492078), ('Looker', 328123), ('The Night the Lights Went Out in Georgia', 1492375), ('Rocky III', 12505269), ('Tex', 54936832), ('Six Weeks', 666802), ('Five Days One Summer', 19908), ('Staying Alive', 6489267), ('Eddie and the Cruisers', 478679), ('Tender Mercies', 844312), ('Testament', 204489), ('Table for Five', 240000), ('Man, Woman and Child', 170591), ('Footloose', 8000894), ('The Natural', 4800000)]
    

    After creating the 'var2' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.

    In [738]:
    var2.sort(key=lambda i:i[1],reverse=True)
    print(var2)
    
    [('Tex', 54936832), ('Cinderella', 54235135), ('Wonder', 30593772), ('Wonder', 30460471), ('Little Women', 21660121), ('Hugo', 18004778), ('Contact', 17112033), ('Resurrection', 15729752), ('Phenomenon', 15203638), ('Bridge to Terabithia', 13758706), ('Sense and Sensibility', 13458278), ('Forever Young', 12795619), ('Rocky III', 12505269), ('On Golden Pond', 11928543), ('The Lake House', 11483011), ("Mr. Holland's Opus", 10626997), ('Dolphin Tale', 9606872), ('The Last Song', 9267895), ('The Last Song', 8913705), ('Footloose', 8000894), ('War Room', 7398690), ('War Room', 7397524), ('Staying Alive', 6489267), ("God's Not Dead", 6466787), ('August Rush', 6460576), ('The Remains of the Day', 6395497), ('The Natural', 4800000), ('A Walk to Remember', 4749492), ('Urban Cowboy', 4691829), ('We Are Marshall', 4354536), ('A River Runs Through It', 4344029), ('Absence of Malice', 4071696), ('Dreamer', 3874173), ('Overcomer', 3810299), ('The Majestic', 3730633), ('Taps', 3585605), ('The Indian in the Cupboard', 3565613), ('Fireproof', 3347330), ('The Age of Innocence', 3225544), ('The Jazz Singer', 2711800), ('Tuck Everlasting', 1934462), ('Akeelah and the Bee', 1894842), ('Honeysuckle Rose', 1781521), ('Extraordinary Measures', 1582698), ('Pure Country', 1516446), ('The Night the Lights Went Out in Georgia', 1492375), ('Ragtime', 1492078), ('Music of the Heart', 1485939), ('The Spanish Prisoner', 1383513), ('The Lunchbox', 1223150), ('Gettysburg', 1076996), ('Somewhere in Time', 970960), ('What If...', 852629), ('Tender Mercies', 844312), ('Three Wishes', 702550), ('Six Weeks', 666802), ('The Secret of Roan Inish', 610182), ('Eddie and the Cruisers', 478679), ('Fluke', 398777), ('The Ultimate Gift', 343874), ('Looker', 328123), ('Newsies', 281948), ('Table for Five', 240000), ('Testament', 204489), ('Man, Woman and Child', 170591), ('Cattle Annie and Little Britches', 53482), ('Five Days One Summer', 19908)]
    
    In [739]:
    all_to = []
    for i in var2:all_to.append(i[1])
    print(sum(all_to))
    
    499784567
    

    Using a for loop to put the Name and the Number of Tickets sold in the PG rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.

    In [776]:
    for i,x in enumerate(range(len(var2))):
        print('         },{ \n           name:',"'",var2[i][0],"'",','+'\n           y:',var2[i][1]/499784567,',','\n           color:"#C70039",')
    
             },{ 
               name: ' Tex ' ,
               y: 0.10992102523245781 , 
               color:"#C70039",
             },{ 
               name: ' Cinderella ' ,
               y: 0.10851702629705251 , 
               color:"#C70039",
             },{ 
               name: ' Wonder ' ,
               y: 0.06121391899642231 , 
               color:"#C70039",
             },{ 
               name: ' Wonder ' ,
               y: 0.06094720207717018 , 
               color:"#C70039",
             },{ 
               name: ' Little Women ' ,
               y: 0.043338915265064594 , 
               color:"#C70039",
             },{ 
               name: ' Hugo ' ,
               y: 0.036025077981249466 , 
               color:"#C70039",
             },{ 
               name: ' Contact ' ,
               y: 0.03423881834270405 , 
               color:"#C70039",
             },{ 
               name: ' Resurrection ' ,
               y: 0.031473064673483604 , 
               color:"#C70039",
             },{ 
               name: ' Phenomenon ' ,
               y: 0.03042038310878855 , 
               color:"#C70039",
             },{ 
               name: ' Bridge to Terabithia ' ,
               y: 0.027529273427924796 , 
               color:"#C70039",
             },{ 
               name: ' Sense and Sensibility ' ,
               y: 0.0269281584279092 , 
               color:"#C70039",
             },{ 
               name: ' Forever Young ' ,
               y: 0.02560226914729842 , 
               color:"#C70039",
             },{ 
               name: ' Rocky III ' ,
               y: 0.025021318835561402 , 
               color:"#C70039",
             },{ 
               name: ' On Golden Pond ' ,
               y: 0.023867369638086482 , 
               color:"#C70039",
             },{ 
               name: ' The Lake House ' ,
               y: 0.022975921543411725 , 
               color:"#C70039",
             },{ 
               name: ' Mr. Holland's Opus ' ,
               y: 0.021263155570788162 , 
               color:"#C70039",
             },{ 
               name: ' Dolphin Tale ' ,
               y: 0.019222026117505144 , 
               color:"#C70039",
             },{ 
               name: ' The Last Song ' ,
               y: 0.018543779884263614 , 
               color:"#C70039",
             },{ 
               name: ' The Last Song ' ,
               y: 0.01783509453584228 , 
               color:"#C70039",
             },{ 
               name: ' Footloose ' ,
               y: 0.01600868559832901 , 
               color:"#C70039",
             },{ 
               name: ' War Room ' ,
               y: 0.014803758436182365 , 
               color:"#C70039",
             },{ 
               name: ' War Room ' ,
               y: 0.01480142543096974 , 
               color:"#C70039",
             },{ 
               name: ' Staying Alive ' ,
               y: 0.012984128419475586 , 
               color:"#C70039",
             },{ 
               name: ' God's Not Dead ' ,
               y: 0.012939149039390006 , 
               color:"#C70039",
             },{ 
               name: ' August Rush ' ,
               y: 0.012926721684865472 , 
               color:"#C70039",
             },{ 
               name: ' The Remains of the Day ' ,
               y: 0.012796507580034979 , 
               color:"#C70039",
             },{ 
               name: ' The Natural ' ,
               y: 0.009604138096565115 , 
               color:"#C70039",
             },{ 
               name: ' A Walk to Remember ' ,
               y: 0.009503078553444008 , 
               color:"#C70039",
             },{ 
               name: ' Urban Cowboy ' ,
               y: 0.00938770284197271 , 
               color:"#C70039",
             },{ 
               name: ' We Are Marshall ' ,
               y: 0.00871282606051339 , 
               color:"#C70039",
             },{ 
               name: ' A River Runs Through It ' ,
               y: 0.008691803002392428 , 
               color:"#C70039",
             },{ 
               name: ' Absence of Malice ' ,
               y: 0.00814690222317329 , 
               color:"#C70039",
             },{ 
               name: ' Dreamer ' ,
               y: 0.007751685937913325 , 
               color:"#C70039",
             },{ 
               name: ' Overcomer ' ,
               y: 0.0076238828719174916 , 
               color:"#C70039",
             },{ 
               name: ' The Majestic ' ,
               y: 0.007464482191583959 , 
               color:"#C70039",
             },{ 
               name: ' Taps ' ,
               y: 0.007174301162444658 , 
               color:"#C70039",
             },{ 
               name: ' The Indian in the Cupboard ' ,
               y: 0.007134299927272464 , 
               color:"#C70039",
             },{ 
               name: ' Fireproof ' ,
               y: 0.006697545744744855 , 
               color:"#C70039",
             },{ 
               name: ' The Age of Innocence ' ,
               y: 0.006453868752613964 , 
               color:"#C70039",
             },{ 
               name: ' The Jazz Singer ' ,
               y: 0.0054259378521386 , 
               color:"#C70039",
             },{ 
               name: ' Tuck Everlasting ' ,
               y: 0.0038705917063661553 , 
               color:"#C70039",
             },{ 
               name: ' Akeelah and the Bee ' ,
               y: 0.003791317549827424 , 
               color:"#C70039",
             },{ 
               name: ' Honeysuckle Rose ' ,
               y: 0.0035645778554022458 , 
               color:"#C70039",
             },{ 
               name: ' Extraordinary Measures ' ,
               y: 0.0031667604494077946 , 
               color:"#C70039",
             },{ 
               name: ' Pure Country ' ,
               y: 0.0030341993333299544 , 
               color:"#C70039",
             },{ 
               name: ' The Night the Lights Went Out in Georgia ' ,
               y: 0.002986036581637784 , 
               color:"#C70039",
             },{ 
               name: ' Ragtime ' ,
               y: 0.002985442325593059 , 
               color:"#C70039",
             },{ 
               name: ' Music of the Heart ' ,
               y: 0.002973159033139973 , 
               color:"#C70039",
             },{ 
               name: ' The Spanish Prisoner ' ,
               y: 0.002768218731331894 , 
               color:"#C70039",
             },{ 
               name: ' The Lunchbox ' ,
               y: 0.002447354481836171 , 
               color:"#C70039",
             },{ 
               name: ' Gettysburg ' ,
               y: 0.002154920481968384 , 
               color:"#C70039",
             },{ 
               name: ' Somewhere in Time ' ,
               y: 0.0019427570679668466 , 
               color:"#C70039",
             },{ 
               name: ' What If... ' ,
               y: 0.0017059930544033786 , 
               color:"#C70039",
             },{ 
               name: ' Tender Mercies ' ,
               y: 0.001689351884288976 , 
               color:"#C70039",
             },{ 
               name: ' Three Wishes ' ,
               y: 0.0014057056707795462 , 
               color:"#C70039",
             },{ 
               name: ' Six Weeks ' ,
               y: 0.0013341788523053774 , 
               color:"#C70039",
             },{ 
               name: ' The Secret of Roan Inish ' ,
               y: 0.0012208900400079781 , 
               color:"#C70039",
             },{ 
               name: ' Eddie and the Cruisers ' ,
               y: 0.0009577706708178526 , 
               color:"#C70039",
             },{ 
               name: ' Fluke ' ,
               y: 0.0007978977870279055 , 
               color:"#C70039",
             },{ 
               name: ' The Ultimate Gift ' ,
               y: 0.0006880444549621317 , 
               color:"#C70039",
             },{ 
               name: ' Looker ' ,
               y: 0.000656528875970674 , 
               color:"#C70039",
             },{ 
               name: ' Newsies ' ,
               y: 0.000564139068343821 , 
               color:"#C70039",
             },{ 
               name: ' Table for Five ' ,
               y: 0.0004802069048282557 , 
               color:"#C70039",
             },{ 
               name: ' Testament ' ,
               y: 0.0004091542906726049 , 
               color:"#C70039",
             },{ 
               name: ' Man, Woman and Child ' ,
               y: 0.0003413290670898207 , 
               color:"#C70039",
             },{ 
               name: ' Cattle Annie and Little Britches ' ,
               y: 0.00010701010701676988 , 
               color:"#C70039",
             },{ 
               name: ' Five Days One Summer ' ,
               y: 3.983316275550381e-05 , 
               color:"#C70039",
    

    Getting the number of Tickets sold in the G-rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.

    In [748]:
    var3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'G':
            var3.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
    print(var3)
    
    [('La traviata', 19549), ('A Sunday in the Country', 241114), ('Little Dorrit', 102523), ('Prancer', 1858714), ('The Secret Garden', 872124), ('Through the Olive Trees', 4030), ('A Little Princess', 1001545), ('The Rookie', 8069354), ('Beauty and the Beast 1991', 43865684), ('The Little Rascals', 6694795), ('Ramona and Beezus', 2746962), ('The Black Stallion', 3779964), ('The Hunchback of Notre Drame', 32550000), ('Babe', 24610000), ('Pollyanna', 375000), ('Babe: Pig in the City', 6913186), ('Lassie Come Home', 451700), ("Charlotte's Web", 14398571), ('A Little Princess', 1001545), ('Kit Kittredge: An American Girl', 1765797), ('The Rookie', 8049152), ('The Secret Garden', 31128100), ('The Sound of Music', 28621420), ('The Tale of Despereaux', 9048232), ('The Lion King 1994', 98621487), ('Bambi 1942', 26800000), ('My Fair Lady 1964', 7207164), ('Before the Wrath', 10900), ("Hachiko: A Dog's Story", 4770742), ('Giant', 3019441), ('The Ten Commandments 1966', 6550000), ('The Quiet Man', 760038), ('Three Cions in the Fountain', 1200000), ('Miracle of Marcelino', 59286)]
    

    After creating the 'var3' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.

    In [749]:
    var3.sort(key=lambda i:i[1],reverse=True)
    print(var3)
    
    [('The Lion King 1994', 98621487), ('Beauty and the Beast 1991', 43865684), ('The Hunchback of Notre Drame', 32550000), ('The Secret Garden', 31128100), ('The Sound of Music', 28621420), ('Bambi 1942', 26800000), ('Babe', 24610000), ("Charlotte's Web", 14398571), ('The Tale of Despereaux', 9048232), ('The Rookie', 8069354), ('The Rookie', 8049152), ('My Fair Lady 1964', 7207164), ('Babe: Pig in the City', 6913186), ('The Little Rascals', 6694795), ('The Ten Commandments 1966', 6550000), ("Hachiko: A Dog's Story", 4770742), ('The Black Stallion', 3779964), ('Giant', 3019441), ('Ramona and Beezus', 2746962), ('Prancer', 1858714), ('Kit Kittredge: An American Girl', 1765797), ('Three Cions in the Fountain', 1200000), ('A Little Princess', 1001545), ('A Little Princess', 1001545), ('The Secret Garden', 872124), ('The Quiet Man', 760038), ('Lassie Come Home', 451700), ('Pollyanna', 375000), ('A Sunday in the Country', 241114), ('Little Dorrit', 102523), ('Miracle of Marcelino', 59286), ('La traviata', 19549), ('Before the Wrath', 10900), ('Through the Olive Trees', 4030)]
    
    In [750]:
    all_to = []
    for i in var3:all_to.append(i[1])
    print(sum(all_to))
    
    377168119
    

    Using a for loop to put the Name and the Number of Tickets sold in the G rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.

    In [751]:
    for i,x in enumerate(range(len(var9))):
        print('         },{ \n           name:',"'",var3[i][0],"'",','+'\n           y:',var3[i][1]/377168119,',','\n           sliced:true,'+'\n           color:"#FFAA00",')
    
             },{ 
               name: ' The Lion King 1994 ' ,
               y: 0.26147885261744513 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Beauty and the Beast 1991 ' ,
               y: 0.11630273554483538 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Hunchback of Notre Drame ' ,
               y: 0.0863010375487224 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Secret Garden ' ,
               y: 0.08253110067343736 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Sound of Music ' ,
               y: 0.07588504584079123 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Bambi 1942 ' ,
               y: 0.07105584658389433 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Babe ' ,
               y: 0.06524941732946415 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Charlotte's Web ' ,
               y: 0.03817547208967575 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Tale of Despereaux ' ,
               y: 0.02398991734505535 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Rookie ' ,
               y: 0.021394581337878135 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Rookie ' ,
               y: 0.021341019016509186 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' My Fair Lady 1964 ' ,
               y: 0.019108624607797248 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Babe: Pig in the City ' ,
               y: 0.018329189694847987 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Little Rascals ' ,
               y: 0.017750161433978465 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Ten Commandments 1966 ' ,
               y: 0.017366261012108503 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Hachiko: A Dog's Story ' ,
               y: 0.012648847449378404 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Black Stallion ' ,
               y: 0.010021960525247894 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Giant ' ,
               y: 0.008005557330788077 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Ramona and Beezus ' ,
               y: 0.007283123524021923 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Prancer ' ,
               y: 0.004928078239825991 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Kit Kittredge: An American Girl ' ,
               y: 0.004681723907847047 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Three Cions in the Fountain ' ,
               y: 0.0031816050709206414 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' A Little Princess ' ,
               y: 0.002655433875629345 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' A Little Princess ' ,
               y: 0.002655433875629345 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Secret Garden ' ,
               y: 0.002312295117392995 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Quiet Man ' ,
               y: 0.002015117295743652 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Lassie Come Home ' ,
               y: 0.0011976091754457114 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Pollyanna ' ,
               y: 0.0009942515846627004 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' A Sunday in the Country ' ,
               y: 0.0006392746042249663 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Little Dorrit ' ,
               y: 0.0002718230805716641 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Miracle of Marcelino ' ,
               y: 0.0001571871985288343 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' La traviata ' ,
               y: 5.183099794285635e-05 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Before the Wrath ' ,
               y: 2.8899579394195828e-05 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Through the Olive Trees ' ,
               y: 1.0684890363175155e-05 , 
               sliced:true,
               color:"#FFAA00",
    

    Getting the number of Tickets sold in the PG-13 rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.

    In [743]:
    var4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
        if x == 'PG-13':
            var4.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
    print(var4)
    
    [('Gravity', 69369867), ('Sing', 63445479), ('Contagion', 13755159), ('Trouble with the Curve', 4781891), ('Burlesque', 9055268), ('Creed II', 21359152), ('The Post', 17974888), ('Hereafter', 10866027), ('Dream House', 4164217), ('Upside Down', 2638704), ('Anna Karenina', 7100463), ('Arrival', 20312789), ('Charlie St. Cloud', 4847808), ('Bridge of Spies', 16249834), ('The Impossible', 16959061), ('Paranoia', 1634077), ('Victor Frankenstein', 3112437), ('Water for Elephants', 11680972), ('Creed', 17356758), ('The Rite', 9714399), ('Collateral Beauty', 8530909), ('True Grit', 25227693), ('The Tree of Life', 6172183), ('The Longest Ride', 6380293), ('Step Up Revolution', 16555229), ('The Vow', 19761816), ('The Age of Adaline', 6898454), ('The Space Between Us', 1648140), ('Safe Haven', 9405095), ('Anonymous', 1581551), ('The Best of Me', 4105942), ('The Help', 21312000), ('Dear John', 14203351), ('The Lucky One', 9663383), ('The Giver', 6654020), ('Draft Day', 2984748), ('Rings', 8291728), ('Fences', 6428288), ('The Beaver', 504604), ('Me Before You', 20826520), ('The Light Between Oceans', 2228173), ('The Book Thief', 7608671), ('Labor Day', 1418981), ('Midnight Special', 768025), ('A Quiet Place', 33452229), ('Beastly', 3802823), ('The Roommate', 5254571), ('Remember Me', 5650612), ('The Woman in Black', 12895590), ('Country Strong', 2060199), ('One Day', 5916869), ('Suffragette', 3404491), ('The Perks of Being a Wallflower', 3306930), ('Project Almanac', 3290944), ('Wish Upon', 2347734), ('If I Stay', 7835617), ('Brooklyn', 6207614), ('Everything, Everything', 6160314), ('Mud', 3155696), ('Amour', 3678704), ('Ouija: Origin of Evil', 8183187), ('Black or White', 2197102), ('The Bye Bye Man', 3118773), ('Gifted', 3696466), ('The Words', 1636971), ('Lights Out', 14880651), ('Still Alice', 4169961), ('Before I Fall', 1894568), ('Rabbit Hole', 620503), ('Maggie', 102776), ('Anna', 120000), ('Ida', 1529836), ('Courageous', 3518588), ('Mustang', 555258), ('Like Crazy', 372840), ('Another Earth', 210278)]
    

    After creating the 'var2' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.

    In [744]:
    var4.sort(key=lambda i:i[1],reverse=True)
    print(var4)
    
    [('Gravity', 69369867), ('Sing', 63445479), ('A Quiet Place', 33452229), ('True Grit', 25227693), ('Creed II', 21359152), ('The Help', 21312000), ('Me Before You', 20826520), ('Arrival', 20312789), ('The Vow', 19761816), ('The Post', 17974888), ('Creed', 17356758), ('The Impossible', 16959061), ('Step Up Revolution', 16555229), ('Bridge of Spies', 16249834), ('Lights Out', 14880651), ('Dear John', 14203351), ('Contagion', 13755159), ('The Woman in Black', 12895590), ('Water for Elephants', 11680972), ('Hereafter', 10866027), ('The Rite', 9714399), ('The Lucky One', 9663383), ('Safe Haven', 9405095), ('Burlesque', 9055268), ('Collateral Beauty', 8530909), ('Rings', 8291728), ('Ouija: Origin of Evil', 8183187), ('If I Stay', 7835617), ('The Book Thief', 7608671), ('Anna Karenina', 7100463), ('The Age of Adaline', 6898454), ('The Giver', 6654020), ('Fences', 6428288), ('The Longest Ride', 6380293), ('Brooklyn', 6207614), ('The Tree of Life', 6172183), ('Everything, Everything', 6160314), ('One Day', 5916869), ('Remember Me', 5650612), ('The Roommate', 5254571), ('Charlie St. Cloud', 4847808), ('Trouble with the Curve', 4781891), ('Still Alice', 4169961), ('Dream House', 4164217), ('The Best of Me', 4105942), ('Beastly', 3802823), ('Gifted', 3696466), ('Amour', 3678704), ('Courageous', 3518588), ('Suffragette', 3404491), ('The Perks of Being a Wallflower', 3306930), ('Project Almanac', 3290944), ('Mud', 3155696), ('The Bye Bye Man', 3118773), ('Victor Frankenstein', 3112437), ('Draft Day', 2984748), ('Upside Down', 2638704), ('Wish Upon', 2347734), ('The Light Between Oceans', 2228173), ('Black or White', 2197102), ('Country Strong', 2060199), ('Before I Fall', 1894568), ('The Space Between Us', 1648140), ('The Words', 1636971), ('Paranoia', 1634077), ('Anonymous', 1581551), ('Ida', 1529836), ('Labor Day', 1418981), ('Midnight Special', 768025), ('Rabbit Hole', 620503), ('Mustang', 555258), ('The Beaver', 504604), ('Like Crazy', 372840), ('Another Earth', 210278), ('Anna', 120000), ('Maggie', 102776)]
    
    In [745]:
    all_to = []
    for i in var4:all_to.append(i[1])
    print(sum(all_to))
    
    690767742
    

    Using a for loop to put the Name and the Number of Tickets sold in the PG-13 rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.

    In [764]:
    for i,x in enumerate(range(len(var4))):
        print('         },{ \n           name:',"'",var4[i][0],"'",','+'\n           y:',var4[i][1]/690767742*100,',','\n           color:"#900C3F",')
    
             },{ 
               name: ' Gravity ' ,
               y: 10.04243000681407 , 
               color:"#900C3F",
             },{ 
               name: ' Sing ' ,
               y: 9.184777334318545 , 
               color:"#900C3F",
             },{ 
               name: ' A Quiet Place ' ,
               y: 4.8427607379500355 , 
               color:"#900C3F",
             },{ 
               name: ' True Grit ' ,
               y: 3.652123784321127 , 
               color:"#900C3F",
             },{ 
               name: ' Creed II ' ,
               y: 3.0920888022590955 , 
               color:"#900C3F",
             },{ 
               name: ' The Help ' ,
               y: 3.08526277418438 , 
               color:"#900C3F",
             },{ 
               name: ' Me Before You ' ,
               y: 3.014981553669598 , 
               color:"#900C3F",
             },{ 
               name: ' Arrival ' ,
               y: 2.9406105359216386 , 
               color:"#900C3F",
             },{ 
               name: ' The Vow ' ,
               y: 2.860848125707642 , 
               color:"#900C3F",
             },{ 
               name: ' The Post ' ,
               y: 2.6021608866616703 , 
               color:"#900C3F",
             },{ 
               name: ' Creed ' ,
               y: 2.5126763953606854 , 
               color:"#900C3F",
             },{ 
               name: ' The Impossible ' ,
               y: 2.4551032089162033 , 
               color:"#900C3F",
             },{ 
               name: ' Step Up Revolution ' ,
               y: 2.3966418802457627 , 
               color:"#900C3F",
             },{ 
               name: ' Bridge of Spies ' ,
               y: 2.352430927499796 , 
               color:"#900C3F",
             },{ 
               name: ' Lights Out ' ,
               y: 2.154219152868317 , 
               color:"#900C3F",
             },{ 
               name: ' Dear John ' ,
               y: 2.0561688301883674 , 
               color:"#900C3F",
             },{ 
               name: ' Contagion ' ,
               y: 1.9912856613967362 , 
               color:"#900C3F",
             },{ 
               name: ' The Woman in Black ' ,
               y: 1.866848901001518 , 
               color:"#900C3F",
             },{ 
               name: ' Water for Elephants ' ,
               y: 1.691012954105202 , 
               color:"#900C3F",
             },{ 
               name: ' Hereafter ' ,
               y: 1.5730362521763501 , 
               color:"#900C3F",
             },{ 
               name: ' The Rite ' ,
               y: 1.4063191445323746 , 
               color:"#900C3F",
             },{ 
               name: ' The Lucky One ' ,
               y: 1.398933738860087 , 
               color:"#900C3F",
             },{ 
               name: ' Safe Haven ' ,
               y: 1.3615422996981814 , 
               color:"#900C3F",
             },{ 
               name: ' Burlesque ' ,
               y: 1.310899083645976 , 
               color:"#900C3F",
             },{ 
               name: ' Collateral Beauty ' ,
               y: 1.2349894879717762 , 
               color:"#900C3F",
             },{ 
               name: ' Rings ' ,
               y: 1.2003641015419624 , 
               color:"#900C3F",
             },{ 
               name: ' Ouija: Origin of Evil ' ,
               y: 1.1846510053157635 , 
               color:"#900C3F",
             },{ 
               name: ' If I Stay ' ,
               y: 1.1343345271615188 , 
               color:"#900C3F",
             },{ 
               name: ' The Book Thief ' ,
               y: 1.1014803583575563 , 
               color:"#900C3F",
             },{ 
               name: ' Anna Karenina ' ,
               y: 1.0279088857626475 , 
               color:"#900C3F",
             },{ 
               name: ' The Age of Adaline ' ,
               y: 0.9986647581467405 , 
               color:"#900C3F",
             },{ 
               name: ' The Giver ' ,
               y: 0.9632789135078054 , 
               color:"#900C3F",
             },{ 
               name: ' Fences ' ,
               y: 0.9306004911850675 , 
               color:"#900C3F",
             },{ 
               name: ' The Longest Ride ' ,
               y: 0.9236524249854158 , 
               color:"#900C3F",
             },{ 
               name: ' Brooklyn ' ,
               y: 0.8986542976119462 , 
               color:"#900C3F",
             },{ 
               name: ' The Tree of Life ' ,
               y: 0.8935250772031564 , 
               color:"#900C3F",
             },{ 
               name: ' Everything, Everything ' ,
               y: 0.8918068441012986 , 
               color:"#900C3F",
             },{ 
               name: ' One Day ' ,
               y: 0.8565641734903133 , 
               color:"#900C3F",
             },{ 
               name: ' Remember Me ' ,
               y: 0.8180190904166453 , 
               color:"#900C3F",
             },{ 
               name: ' The Roommate ' ,
               y: 0.7606856372282654 , 
               color:"#900C3F",
             },{ 
               name: ' Charlie St. Cloud ' ,
               y: 0.7017999980664992 , 
               color:"#900C3F",
             },{ 
               name: ' Trouble with the Curve ' ,
               y: 0.6922574273886692 , 
               color:"#900C3F",
             },{ 
               name: ' Still Alice ' ,
               y: 0.6036704881334775 , 
               color:"#900C3F",
             },{ 
               name: ' Dream House ' ,
               y: 0.6028389495929878 , 
               color:"#900C3F",
             },{ 
               name: ' The Best of Me ' ,
               y: 0.5944026841948274 , 
               color:"#900C3F",
             },{ 
               name: ' Beastly ' ,
               y: 0.5505212199095423 , 
               color:"#900C3F",
             },{ 
               name: ' Gifted ' ,
               y: 0.5351242936297972 , 
               color:"#900C3F",
             },{ 
               name: ' Amour ' ,
               y: 0.5325529517850589 , 
               color:"#900C3F",
             },{ 
               name: ' Courageous ' ,
               y: 0.5093735254359923 , 
               color:"#900C3F",
             },{ 
               name: ' Suffragette ' ,
               y: 0.4928561067635958 , 
               color:"#900C3F",
             },{ 
               name: ' The Perks of Being a Wallflower ' ,
               y: 0.47873254625720496 , 
               color:"#900C3F",
             },{ 
               name: ' Project Almanac ' ,
               y: 0.4764183096436486 , 
               color:"#900C3F",
             },{ 
               name: ' Mud ' ,
               y: 0.45683893559696653 , 
               color:"#900C3F",
             },{ 
               name: ' The Bye Bye Man ' ,
               y: 0.45149372363135043 , 
               color:"#900C3F",
             },{ 
               name: ' Victor Frankenstein ' ,
               y: 0.4505764833471335 , 
               color:"#900C3F",
             },{ 
               name: ' Draft Day ' ,
               y: 0.43209139896402404 , 
               color:"#900C3F",
             },{ 
               name: ' Upside Down ' ,
               y: 0.38199583442621154 , 
               color:"#900C3F",
             },{ 
               name: ' Wish Upon ' ,
               y: 0.339873137851304 , 
               color:"#900C3F",
             },{ 
               name: ' The Light Between Oceans ' ,
               y: 0.32256471524693753 , 
               color:"#900C3F",
             },{ 
               name: ' Black or White ' ,
               y: 0.3180666765993829 , 
               color:"#900C3F",
             },{ 
               name: ' Country Strong ' ,
               y: 0.29824771406305767 , 
               color:"#900C3F",
             },{ 
               name: ' Before I Fall ' ,
               y: 0.27426990069261226 , 
               color:"#900C3F",
             },{ 
               name: ' The Space Between Us ' ,
               y: 0.2385953917344334 , 
               color:"#900C3F",
             },{ 
               name: ' The Words ' ,
               y: 0.23697849515387473 , 
               color:"#900C3F",
             },{ 
               name: ' Paranoia ' ,
               y: 0.23655954102153195 , 
               color:"#900C3F",
             },{ 
               name: ' Anonymous ' ,
               y: 0.22895553799615617 , 
               color:"#900C3F",
             },{ 
               name: ' Ida ' ,
               y: 0.2214689405690285 , 
               color:"#900C3F",
             },{ 
               name: ' Labor Day ' ,
               y: 0.20542085475670635 , 
               color:"#900C3F",
             },{ 
               name: ' Midnight Special ' ,
               y: 0.1111842596726238 , 
               color:"#900C3F",
             },{ 
               name: ' Rabbit Hole ' ,
               y: 0.0898280221081893 , 
               color:"#900C3F",
             },{ 
               name: ' Mustang ' ,
               y: 0.08038273449080661 , 
               color:"#900C3F",
             },{ 
               name: ' The Beaver ' ,
               y: 0.07304973427667674 , 
               color:"#900C3F",
             },{ 
               name: ' Like Crazy ' ,
               y: 0.05397472657314678 , 
               color:"#900C3F",
             },{ 
               name: ' Another Earth ' ,
               y: 0.03044120146536895 , 
               color:"#900C3F",
             },{ 
               name: ' Anna ' ,
               y: 0.017371975079867003 , 
               color:"#900C3F",
             },{ 
               name: ' Maggie ' ,
               y: 0.014878517590070093 , 
               color:"#900C3F",
    

    Putting all the number of tickest that each movie made in each System Rating in a list.

    In [813]:
    ttl = var1+var6+var9+var12+var
    print(ttl)
    
    [2041284, 1735627, 100840, 27784, 161478, 2041222, 2035075, 9841006, 49606, 1512116, 102215, 6709192, 2041284, 1946584, 3775075, 1530711, 2041284, 1946584, 1656624, 231503, 382224, 21312000, 465911, 6516743, 123684, 266194, 20557, 2041284, 209430, 345342, 278354, 10309, 5028356, 389424, 203892, 69087, 900000, 2041222, 10117304, 3614771, 41380, 6516743, 574645, 62729, 191417, 256182, 100840, 102215, 147081, 18004778, 9606872, 1582698, 30460471, 9267895, 7397524, 1223150, 970960, 4691829, 54235135, 7398690, 30593772, 21660121, 3810299, 2711800, 53482, 3730633, 4749492, 1934462, 3874173, 11483011, 4354536, 1894842, 343874, 13758706, 6460576, 3347330, 8913705, 852629, 6466787, 10626997, 3565613, 398777, 702550, 15203638, 17112033, 1383513, 1485939, 13458278, 610182, 6395497, 1076996, 3225544, 1516446, 12795619, 281948, 4344029, 1781521, 15729752, 3585605, 11928543, 4071696, 1492078, 328123, 1492375, 12505269, 54936832, 666802, 19908, 6489267, 478679, 844312, 204489, 240000, 170591, 8000894, 4800000, 19549, 241114, 102523, 1858714, 872124, 4030, 1001545, 8069354, 43865684, 6694795, 2746962, 3779964, 32550000, 24610000, 375000, 6913186, 451700, 14398571, 1001545, 1765797, 8049152, 31128100, 28621420, 9048232, 98621487, 26800000, 7207164, 10900, 4770742, 3019441, 6550000, 760038, 1200000, 59286, 69369867, 63445479, 13755159, 4781891, 9055268, 21359152, 17974888, 10866027, 4164217, 2638704, 7100463, 20312789, 4847808, 16249834, 16959061, 1634077, 3112437, 11680972, 17356758, 9714399, 8530909, 25227693, 6172183, 6380293, 16555229, 19761816, 6898454, 1648140, 9405095, 1581551, 4105942, 21312000, 14203351, 9663383, 6654020, 2984748, 8291728, 6428288, 504604, 20826520, 2228173, 7608671, 1418981, 768025, 33452229, 3802823, 5254571, 5650612, 12895590, 2060199, 5916869, 3404491, 3306930, 3290944, 2347734, 7835617, 6207614, 6160314, 3155696, 3678704, 8183187, 2197102, 3118773, 3696466, 1636971, 14880651, 4169961, 1894568, 620503, 102776, 120000, 1529836, 3518588, 555258, 372840, 210278, 14263436, 44994832, 5446297, 36856719, 8415403, 38139849, 37135062, 7496685, 13461244, 57099810, 5064742, 2468752, 16055844, 679277, 7773592, 3239868, 3105473, 406502, 3801787, 4660405, 2827040, 372775, 771963, 821757, 758501, 1117372, 52873, 33126671, 3835839, 3626278, 1183113, 1985917, 3583071, 1203491, 4284352, 5617894, 7013390, 217962, 2181730, 7773387, 1076528, 38295, 1749924, 282101, 1753600, 497202, 5727305, 67948, 4045452, 2043323, 3896904, 2325193, 1661076, 1613155, 1129532, 1015342, 208839, 632852, 2127029, 1424493, 1656624, 543891, 115631, 85240, 6238, 42945, 276978, 5476692, 7721184, 3471817, 195168, 63680, 244758, 917129, 325608, 1300000, 1100000]
    

    Adding all the number of tickest that each movie made in each System Rating togther and storing it in a varible.

    In [816]:
    tt2 = 0
    for i in ttl:tt2+=i
    print(tt2)
    
    2121357473
    
    In [760]:
    j = 690767742+377168119+499784567+103856414+449780631
    

    Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the R-rated category.

    In [817]:
    v = 0
    for i in var:
        v+=i
    (v/tt2*100)
    
    Out[817]:
    21.20249117485726

    Getting the Average number of tickets sold by R-rated Drama movies .

    In [822]:
    v/len(var)
    
    Out[822]:
    5841306.896103896

    Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the NC-17 rated category.

    In [819]:
    v1 = 0
    for i in var3:
        v1+=i
    (v1/tt2*100)
    
    Out[819]:
    4.895752616984794

    Getting the Average number of tickets sold by NC-17 rated Drama movies .

    In [823]:
    v1/len(var3)
    
    Out[823]:
    2119518.6530612246

    Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the PG rated category.

    In [820]:
    v2 = 0
    for i in var6:
        v2+=i
    v2/tt2*100
    
    Out[820]:
    23.559658066172613

    Getting the Average number of tickets sold by PG-rated Drama movies .

    In [824]:
    v2/len(var6)
    
    Out[824]:
    7459471.149253732

    Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the G rated category.

    In [821]:
    v3 = 0
    for i in var9:
        v3+=i
    v3/tt2*100
    
    Out[821]:
    17.779564444016742

    Getting the Average number of tickets sold by G-rated Drama movies .

    In [825]:
    v3/len(var9)
    
    Out[825]:
    11093179.970588235

    Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the PG-13 rated category.

    In [827]:
    v4= 0
    for i in var12:
        v4+=i
    v4/tt2*100
    
    Out[827]:
    32.56253369796859

    Getting the Average number of tickets sold by PG-13 rated Drama movies .

    In [828]:
    v4/len(var12)
    
    Out[828]:
    9089049.236842105

    Getting the Amount of Tickets from movies that made Profit that are R-rated.

    In [323]:
    r_ticks = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='R':
                r_ticks.append(Drama_DataFrame.Tickets[i])
    print(r_ticks)
    
    [44994832, 36856719, 8415403, 38139849, 37135062, 7496685, 13461244, 57099810, 5064742, 16055844, 7773592, 3239868, 3105473, 3801787, 4660405, 2827040, 33126671, 3835839, 3626278, 1985917, 3583071, 1203491, 4284352, 5617894, 7013390, 2181730, 7773387, 1076528, 1749924, 1753600, 497202, 5727305, 4045452, 2043323, 3896904, 2325193, 1661076, 1613155, 1129532, 1015342, 208839, 632852, 2127029, 1424493, 1656624, 543891, 115631, 42945, 276978, 5476692, 7721184, 3471817, 195168, 325608, 1300000, 1100000]
    

    Getting the Budget from movies that made Profit that are R-rated.

    In [324]:
    r_bud = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='R':
                r_bud.append(Drama_DataFrame.Production_Budget[i])
    print(r_bud)
    
    [100000000.0, 61000000.0, 60000000.0, 55000000.0, 55000000.0, 55000000.0, 52500000.0, 40000000.0, 37500000.0, 31000000.0, 23000000.0, 22500000.0, 22500000.0, 21000000.0, 20000000.0, 20000000.0, 13000000.0, 13000000.0, 13000000.0, 12000000.0, 12000000.0, 12000000.0, 11800000.0, 11000000.0, 10000000.0, 9400000.0, 8500000.0, 7000000.0, 5000000.0, 4900000.0, 4750000.0, 4000000.0, 3500000.0, 3400000.0, 3300000.0, 3000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 1987650.0, 1500000.0, 1000000.0, 1000000.0, 1000000.0, 135000.0, 100000.0, 6000000.0, 8500000.0, 20000000.0, 100000.0, 2700000.0, 11500000.0, 9000000.0]
    

    Getting the Return On Investment for the movies that made Profit that are R-rated.

    In [325]:
    percent_return_on_investment = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='R':
                i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
                percent_return_on_investment.append(round(i,0))
    print(percent_return_on_investment)
    
    [350.0, 504.0, 40.0, 593.0, 575.0, 36.0, 156.0, 1327.0, 35.0, 418.0, 238.0, 44.0, 38.0, 81.0, 133.0, 41.0, 2448.0, 195.0, 179.0, 65.0, 199.0, 0.0, 263.0, 411.0, 601.0, 132.0, 815.0, 54.0, 250.0, 258.0, 5.0, 1332.0, 1056.0, 501.0, 1081.0, 675.0, 731.0, 707.0, 465.0, 408.0, 4.0, 216.0, 970.0, 850.0, 1557.0, 444.0, 16.0, 218.0, 2670.0, 813.0, 808.0, 74.0, 1852.0, 21.0, 13.0, 22.0]
    

    Getting the Net Profit Margin for the movies that made Profit that are R-rated.

    In [326]:
    net_profit = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x == 'R':
                net_profit.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(net_profit)
    
    [77, 83, 28, 85, 85, 26, 60, 92, 25, 80, 70, 30, 27, 44, 57, 29, 96, 66, 64, 39, 66, 0, 72, 80, 85, 56, 89, 34, 71, 72, 4, 93, 91, 83, 91, 87, 87, 87, 82, 80, 4, 68, 90, 89, 93, 81, 13, 68, 96, 89, 88, 42, 94, 17, 11, 18]
    

    Printing out 'R' 56 times for the R-rated category in the Javascript graph below.

    In [327]:
    system1 = []
    for i in range(56):
        system1.append('R')
    print(system1)
    
    ['R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R']
    

    Getting the Amount of Tickets from movies that made Profit that are PG-rated.

    In [328]:
    pg_ticks = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='PG':
                pg_ticks.append(Drama_DataFrame.Tickets[i])
    print(pg_ticks)
    
    [18004778, 9606872, 30460471, 9267895, 7397524, 1223150, 970960, 4691829, 54235135, 7398690, 30593772, 21660121, 3810299, 2711800, 4749492, 1934462, 3874173, 11483011, 1894842, 13758706, 6460576, 3347330, 8913705, 6466787, 10626997, 15203638, 17112033, 1383513, 13458278, 610182, 6395497, 1516446, 12795619, 4344029, 1781521, 15729752, 3585605, 11928543, 4071696, 1492375, 12505269, 54936832, 6489267, 844312, 8000894, 4800000]
    

    Getting the Budget from movies that made Profit that are PG-rated.

    In [329]:
    pg_bud = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='PG':
                pg_bud.append(Drama_DataFrame.Production_Budget[i])
    print(pg_bud)
    
    [180000000.0, 37000000.0, 20000000.0, 20000000.0, 3000000.0, 1700000.0, 5100000.0, 10000000.0, 95000000.0, 3000000.0, 20000000.0, 40000000.0, 5000000.0, 422000.0, 11800000.0, 15000000.0, 32000000.0, 40000000.0, 8000000.0, 17000000.0, 30000000.0, 500000.0, 20000000.0, 2000000.0, 23000000.0, 32000000.0, 90000000.0, 10000000.0, 16000000.0, 3000000.0, 15000000.0, 10000000.0, 20000000.0, 12000000.0, 5000000.0, 7000000.0, 14000000.0, 15000000.0, 12000000.0, 7500000.0, 17000000.0, 5000000.0, 22000000.0, 4500000.0, 8200000.0, 28000000.0]
    

    Getting the Return On Investment for the movies that made Profit that are PG-rated.

    In [330]:
    percent_return_on_investment1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='PG':
                i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
                percent_return_on_investment1.append(round(i,0))
    print(percent_return_on_investment1)
    
    [0.0, 160.0, 1423.0, 363.0, 2366.0, 620.0, 90.0, 369.0, 471.0, 2366.0, 1430.0, 442.0, 662.0, 6326.0, 302.0, 29.0, 21.0, 187.0, 137.0, 709.0, 115.0, 6595.0, 346.0, 3133.0, 362.0, 375.0, 90.0, 38.0, 741.0, 103.0, 326.0, 52.0, 540.0, 262.0, 256.0, 2147.0, 156.0, 695.0, 239.0, 99.0, 636.0, 10887.0, 195.0, 88.0, 876.0, 71.0]
    

    Getting the Net Profit Margin for the movies that made Profit that are PG-rated.

    In [331]:
    net_profit1 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x == 'PG':
                net_profit1.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(net_profit1)
    
    [0, 61, 93, 78, 95, 86, 47, 78, 82, 95, 93, 81, 86, 98, 75, 22, 17, 65, 57, 87, 53, 98, 77, 96, 78, 78, 47, 27, 88, 50, 76, 34, 84, 72, 71, 95, 60, 87, 70, 49, 86, 99, 66, 46, 89, 41]
    

    Printing out 'PG' 46 times for the PG-rated category in the Javascript graph below.

    In [332]:
    system2 = []
    for i in range(46):
        system2.append('PG')
    print(system2)
    
    ['PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG']
    

    Getting the Amount of Tickets from movies that made Profit that are G-rated.

    In [333]:
    g_ticks = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='G':
                g_ticks.append(Drama_DataFrame.Tickets[i])
    print(g_ticks)
    
    [241114, 1858714, 8069354, 43865684, 6694795, 2746962, 3779964, 32550000, 24610000, 375000, 451700, 14398571, 1765797, 8049152, 31128100, 28621420, 9048232, 98621487, 26800000, 7207164, 4770742, 3019441, 6550000, 760038, 1200000]
    

    Getting the Budget from movies that made Profit that are G-rated.

    In [334]:
    g_bud = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='G':
                g_bud.append(Drama_DataFrame.Production_Budget[i])
    print(g_bud)
    
    [700000.0, 7000000.0, 22000000.0, 20000000.0, 23000000.0, 15000000.0, 2700000.0, 70000000.0, 30000000.0, 2500000.0, 666000.0, 85000000.0, 10000000.0, 22000000.0, 18000000.0, 8200000.0, 60000000.0, 45000000.0, 858000.0, 17000000.0, 10000000.0, 6400000.0, 13000000.0, 1750000.0, 1700000.0]
    

    Getting the Return On Investment for the movies that made Profit that are G-rated.

    In [335]:
    percent_return_on_investment2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='G':
                i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
                percent_return_on_investment2.append(round(i,0))
    print(percent_return_on_investment2)
    
    [244.0, 166.0, 267.0, 2093.0, 191.0, 83.0, 1300.0, 365.0, 720.0, 50.0, 578.0, 69.0, 77.0, 266.0, 1629.0, 3390.0, 51.0, 2092.0, 31135.0, 324.0, 377.0, 372.0, 404.0, 334.0, 606.0]
    

    Getting the Net Profit Margin for the movies that made Profit that are G-rated.

    In [336]:
    net_profit2 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x == 'G':
                net_profit2.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(net_profit2)
    
    [70, 62, 72, 95, 65, 45, 92, 78, 87, 33, 85, 40, 43, 72, 94, 97, 33, 95, 99, 76, 79, 78, 80, 76, 85]
    

    Printing out 'G' 25 times for the G-rated category in the Javascript graph below.

    In [337]:
    system3 = []
    for i in range(25):
        system3.append('G')
    print(system3)
    
    ['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G']
    

    Getting the Amount of Tickets from movies that made Profit that are PG-13 rated.

    In [338]:
    pg13_ticks = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='PG-13':
                pg13_ticks.append(Drama_DataFrame.Tickets[i])
    print(pg13_ticks)
    
    [69369867, 63445479, 13755159, 9055268, 21359152, 17974888, 10866027, 7100463, 20312789, 4847808, 16249834, 16959061, 11680972, 17356758, 9714399, 8530909, 25227693, 6172183, 6380293, 16555229, 19761816, 6898454, 9405095, 4105942, 21312000, 14203351, 9663383, 6654020, 2984748, 8291728, 6428288, 20826520, 2228173, 7608671, 33452229, 3802823, 5254571, 5650612, 12895590, 2060199, 5916869, 3404491, 3306930, 3290944, 2347734, 7835617, 6207614, 6160314, 3155696, 3678704, 8183187, 2197102, 3118773, 3696466, 1636971, 14880651, 4169961, 1894568, 620503, 1529836, 3518588, 555258, 372840, 210278]
    

    Getting the Budget from movies that made Profit that are PG-13 rated.

    In [339]:
    pg13_bud = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='PG-13':
                pg13_bud.append(Drama_DataFrame.Production_Budget[i])
    print(pg13_bud)
    
    [110000000.0, 75000000.0, 60000000.0, 55000000.0, 50000000.0, 50000000.0, 50000000.0, 49000000.0, 47000000.0, 44000000.0, 40000000.0, 40000000.0, 38000000.0, 37000000.0, 37000000.0, 36000000.0, 35000000.0, 35000000.0, 34000000.0, 33000000.0, 30000000.0, 30000000.0, 28000000.0, 26000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 24000000.0, 20000000.0, 20000000.0, 19000000.0, 17000000.0, 17000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 15000000.0, 14000000.0, 13000000.0, 12000000.0, 12000000.0, 11000000.0, 11000000.0, 10000000.0, 10000000.0, 9700000.0, 9000000.0, 9000000.0, 7400000.0, 7000000.0, 6000000.0, 5000000.0, 5000000.0, 5000000.0, 5000000.0, 2600000.0, 2000000.0, 1400000.0, 250000.0, 175000.0]
    

    Getting the Return On Investment for the movies that made Profit that are PG-13 rated.

    In [340]:
    percent_return_on_investment3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='PG-13':
                i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
                percent_return_on_investment3.append(round(i,0))
    print(percent_return_on_investment3)
    
    [531.0, 746.0, 129.0, 65.0, 327.0, 259.0, 117.0, 45.0, 332.0, 10.0, 306.0, 324.0, 207.0, 369.0, 163.0, 137.0, 621.0, 76.0, 88.0, 402.0, 559.0, 130.0, 236.0, 58.0, 752.0, 468.0, 287.0, 166.0, 19.0, 232.0, 168.0, 941.0, 11.0, 300.0, 1868.0, 124.0, 228.0, 253.0, 760.0, 37.0, 294.0, 143.0, 154.0, 174.0, 96.0, 612.0, 464.0, 516.0, 216.0, 279.0, 809.0, 144.0, 321.0, 428.0, 173.0, 2876.0, 734.0, 279.0, 24.0, 488.0, 1659.0, 297.0, 1391.0, 1102.0]
    

    Getting the Net Profit Margin for the movies that made Profit that are PG-13 rated.

    In [341]:
    net_profit3 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x == 'PG-13':
                net_profit3.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(net_profit3)
    
    [84, 88, 56, 39, 76, 72, 53, 30, 76, 9, 75, 76, 67, 78, 61, 57, 86, 43, 46, 80, 84, 56, 70, 36, 88, 82, 74, 62, 16, 69, 62, 90, 10, 75, 94, 55, 69, 71, 88, 27, 74, 58, 60, 63, 48, 85, 82, 83, 68, 73, 89, 59, 76, 81, 63, 96, 88, 73, 19, 83, 94, 74, 93, 91]
    

    Printing out 'PG-13' 64 times for the PG-13 rated category in the Javascript graph below.

    In [342]:
    system4 = []
    for i in range(64):
        system4.append('PG-13')
    print(system4)
    
    ['PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13']
    

    Getting the Amount of Tickets from movies that made Profit that are NC-17 rated.

    In [343]:
    nc17_ticks = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='NC-17':
                nc17_ticks.append(Drama_DataFrame.Tickets[i])
    print(nc17_ticks)
    
    [2041284, 1735627, 100840, 27784, 161478, 2041222, 9841006, 1512116, 6709192, 2041284, 1946584, 1530711, 2041284, 1946584, 1656624, 231503, 382224, 21312000, 6516743, 266194, 2041284, 345342, 5028356, 389424, 203892, 900000, 2041222, 10117304, 3614771, 41380, 6516743, 574645, 100840, 147081]
    

    Getting the Budget from movies that made Profit that are NC-17 rated.

    In [344]:
    nc17_bud = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='NC-17':
                nc17_bud.append(Drama_DataFrame.Production_Budget[i])
    print(nc17_bud)
    
    [6500000.0, 12500000.0, 1000000.0, 20000.0, 955472.0, 1500000.0, 9000000.0, 15000000.0, 15000000.0, 6500000.0, 4000000.0, 15000000.0, 6500000.0, 4074940.0, 1000000.0, 1000000.0, 3565572.0, 12000000.0, 15000000.0, 350000.0, 6500000.0, 904765.0, 34000000.0, 230000.0, 1000000.0, 1000000.0, 1500000.0, 6500000.0, 1250000.0, 12000.0, 15000000.0, 2200000.0, 50000.0, 612072.0]
    

    Getting the Return On Investment for the movies that made Profit that are NC-17 rated.

    In [345]:
    percent_return_on_investment4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x =='NC-17':
                i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
                percent_return_on_investment4.append(round(i,0))
    print(percent_return_on_investment4)
    
    [214.0, 39.0, 1.0, 1289.0, 69.0, 1261.0, 993.0, 1.0, 347.0, 214.0, 387.0, 2.0, 214.0, 378.0, 1557.0, 132.0, 7.0, 1676.0, 334.0, 661.0, 214.0, 282.0, 48.0, 1593.0, 104.0, 800.0, 1261.0, 1457.0, 2792.0, 3348.0, 334.0, 161.0, 1917.0, 140.0]
    

    Getting the Net Profit Margin for the movies that made Profit that are NC-17 rated.

    In [346]:
    # Creating the profit rev percentage column
    net_profit4 = []
    for i,x in enumerate(Drama_DataFrame.Rating):
            if Drama_DataFrame.Profit[i] < 0: continue
            elif x == 'NC-17':
                net_profit4.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
    print(net_profit4)
    
    [68, 27, 0, 92, 40, 92, 90, 0, 77, 68, 79, 2, 68, 79, 93, 56, 6, 94, 76, 86, 68, 73, 32, 94, 50, 88, 92, 93, 96, 97, 76, 61, 95, 58]
    

    Printing out 'NC-17' 34 times for the NC-17 rated category in the Javascript graph below.

    In [347]:
    system5 = []
    for i in range(34):
        system5.append('NC-17')
    print(system5)
    
    ['NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17']
    

    Creating a dataframe called 'df_data' that will be used to get some findings for the Javascript graphs below. This dataframe consist of the Name, Budget, Profit, Return On Investment, Number of Tickets, Nat Profit Margin, System Rating and the Season the movie was realesed of the movies that made Profit in the 'Drama_DataFrame' dataframe, that was created in the beginning of this project.

    In [397]:
    df_data = pd.DataFrame({'Budget':r_bud+pg_bud+g_bud+pg13_bud+nc17_bud,
                       'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
                       "Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4,
                       "Name":name+name1+name2+name3+name4,
                       "No.Tickets":r_ticks+pg_ticks+g_ticks+pg13_ticks+nc17_ticks,
                       "ROI":percent_return_on_investment+percent_return_on_investment1+
                       percent_return_on_investment2+percent_return_on_investment3+
                       percent_return_on_investment4,
                       "NPM":net_profit+net_profit1+net_profit2+net_profit3+net_profit4,
                       "System":system1+system2+system3+system4+system5
                       })
    

    This is the 'df_data' dataframe.

    In [398]:
    df_data
    
    Out[398]:
    Budget Season Profit Name No.Tickets ROI NPM System
    0 100000000.0 1 349948323 Django Unchained 44994832 350.0 77 R
    1 61000000.0 4 307567189 Gone Girl 36856719 504.0 83 R
    2 60000000.0 2 24154026 Priest 8415403 40.0 28 R
    3 55000000.0 1 326398492 Fifty Shades Darker 38139849 593.0 85 R
    4 55000000.0 1 316350619 Fifty Shades Freed 37135062 575.0 85 R
    ... ... ... ... ... ... ... ... ...
    220 12000.0 2 401802 Pink Flamingos 41380 3348.0 97 NC-17
    221 15000000.0 4 50167430 Lust, Caution 6516743 334.0 76 NC-17
    222 2200000.0 4 3546453 Happiness 1998 574645 161.0 61 NC-17
    223 50000.0 4 958404 Whore 1991 100840 1917.0 95 NC-17
    224 612072.0 2 858737 Law of Desire 147081 140.0 58 NC-17

    225 rows × 8 columns

    Checking to see the different season each movie in the R-rated category was realesed in. Based on the code below there are 14 movies that were realesed in 'Winter', 24 movies that were realesed in 'Autumn', 11 movies that were realesed in 'Spring' and 7 movies that were realesed in 'Summer'.

    In [941]:
    collections.Counter(season_r)
    
    Out[941]:
    Counter({1: 14, 4: 24, 2: 11, 3: 7})

    Getting the index of the R-rated movies that were realesed in Winter, Summer, Autumn and Spring.

    In [449]:
    index_one = []
    index_two = []
    index_three = []
    index_four = []
    for x,i in enumerate(df_data.System):
        if i == 'R':
            if df_data['Season'][x] ==1: index_one.append(x)
            elif df_data['Season'][x] ==2: index_two.append(x)
            elif df_data['Season'][x] ==3: index_three.append(x)
            elif df_data['Season'][x] ==4: index_four.append(x)
    

    Getting the number of Tickets sold in the Winter by R-rated Drama movies.

    In [450]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['No.Tickets'][i]))
    total1 = sum(sum1)
    total1
    
    Out[450]:
    240614259

    Getting the Average Net Profit Margin made in the Winter by R-rated Drama movies.

    In [451]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['NPM'][i]))
    total2 = sum(sum1)
    total2/14
    
    Out[451]:
    72.5

    Getting the Amount of Expenses spent in the Winter to produce R-rated Drama movies.

    In [452]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['Budget'][i]))
    total3 = sum(sum1)
    total3
    
    Out[452]:
    357700000

    Getting the number of Tickets sold in the Spring by R-rated Drama movies.

    In [453]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['No.Tickets'][i]))
    total4 = sum(sum2)
    total4
    
    Out[453]:
    30059567

    Getting the Average Net Profit Margin made in the Spring by R-rated Drama movies.

    In [454]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['NPM'][i]))
    total5 = sum(sum2)
    total5/11
    
    Out[454]:
    50.63636363636363

    Getting the Amount of Expenses spent in the Spring to produce R-rated Drama movies.

    In [455]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['Budget'][i]))
    total6 = sum(sum2)
    total6
    
    Out[455]:
    125635000

    Getting the number of Tickets sold in the Summer by R-rated Drama movies.

    In [456]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['No.Tickets'][i]))
    total7 = sum(sum3)
    total7
    
    Out[456]:
    23778392

    Getting the Average Net Profit Margin made in the Summer by R-rated Drama movies.

    In [457]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['NPM'][i]))
    total8 = sum(sum3)
    total8/7
    
    Out[457]:
    77.14285714285714

    Getting the Amount of Expenses spent in the Summer to produce R-rated Drama movies.

    In [458]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['Budget'][i]))
    total9 = sum(sum3)
    total9
    
    Out[458]:
    58100000

    Getting the number of Tickets sold in the Autumn by R-rated Drama movies.

    In [459]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['No.Tickets'][i]))
    total10 = sum(sum4)
    total10
    
    Out[459]:
    125062444

    Getting the Average Net Profit Margin made in the Autumn by R-rated Drama movies.

    In [460]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['NPM'][i]))
    total11 = sum(sum4)
    total11/24
    
    Out[460]:
    59.25

    Getting the Amount of Expenses spent in the Autumn to produce R-rated Drama movies.

    In [461]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['Budget'][i]))
    total12 = sum(sum4)
    total12
    
    Out[461]:
    375637650

    Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by R-rated Drama Movies, in a list for the Javascript graph below.

    In [462]:
    r_total_tick = [total1/14, total4/11, total7/7, total10/24]
    print(r_total_tick)
    
    [17186732.785714287, 2732687.909090909, 3396913.1428571427, 5210935.166666667]
    

    Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by R-rated Drama Movies, in a list for the Javascript graph below.

    In [463]:
    r_total_pro = [total2/14, total5/11, total8/7, total11/24]
    print(r_total_pro)
    
    [72.5, 50.63636363636363, 77.14285714285714, 59.25]
    

    Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by R-rated Drama Movies, in a list for the Javascript graph below.

    In [464]:
    r_total_bud = [total3/14, total6/11, total9/7, total12/24]
    print(r_total_bud)
    
    [25550000.0, 11421363.636363637, 8300000.0, 15651568.75]
    

    Checking to see the different season each movie in the PG-rated category was realesed in. Based on the code below there are 13 movies that were realesed in 'Winter', 13 movies that were realesed in 'Autumn', 9 movies that were realesed in 'Spring' and 11 movies that were realesed in 'Summer'.

    In [961]:
    collections.Counter(season_pg)
    
    Out[961]:
    Counter({4: 13, 2: 9, 3: 11, 1: 13})

    Getting the index of the PG-rated movies that were realesed in Winter, Summer, Autumn and Spring.

    In [468]:
    index_one = []
    index_two = []
    index_three = []
    index_four = []
    for x,i in enumerate(df_data.System):
        if i == 'PG':
            if df_data['Season'][x] ==1: index_one.append(x)
            if df_data['Season'][x] ==2: index_two.append(x)
            if df_data['Season'][x] ==3: index_three.append(x)
            if df_data['Season'][x] ==4: index_four.append(x)
    

    Getting the number of Tickets sold in the Winter by PG-rated Drama movies.

    In [469]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['No.Tickets'][i]))
    total13 = sum(sum1)
    total13
    
    Out[469]:
    109181083

    Getting the Average Net Profit Margin made in the Winter by PG-rated Drama movies.

    In [470]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['NPM'][i]))
    total14 = sum(sum1)
    total14/13
    
    Out[470]:
    79.46153846153847

    Getting the Amount of Expenses spent in the Winter to produce PG-rated Drama movies.

    In [471]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['Budget'][i]))
    total15 = sum(sum1)
    total15
    
    Out[471]:
    182122000

    Getting the number of Tickets sold in the Spring by PG-rated Drama movies.

    In [472]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['No.Tickets'][i]))
    total16 = sum(sum2)
    total16
    
    Out[472]:
    100311458

    Getting the Average Net Profit Margin made in the Spring by PG-rated Drama movies.

    In [473]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['NPM'][i]))
    total17 = sum(sum2)
    total17/9
    
    Out[473]:
    65.55555555555556

    Getting the Amount of Expenses spent in the Spring to produce PG-rated Drama movies.

    In [474]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['Budget'][i]))
    total18 = sum(sum2)
    total18
    
    Out[474]:
    204500000

    Getting the number of Tickets sold in the Summer by PG-rated Drama movies.

    In [475]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['No.Tickets'][i]))
    total19 = sum(sum3)
    total19
    
    Out[475]:
    131797019

    Getting the Average Net Profit Margin made in the Summer by PG-rated Drama movies.

    In [476]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['NPM'][i]))
    total20 = sum(sum3)
    total20/11
    
    Out[476]:
    75.36363636363636

    Getting the Amount of Expenses spent in the Summer to produce PG-rated Drama movies.

    In [477]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['Budget'][i]))
    total21 = sum(sum3)
    total21
    
    Out[477]:
    222500000

    Getting the number of Tickets sold in the Autumn by PG-rated Drama movies.

    In [478]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['No.Tickets'][i]))
    total22 = sum(sum4)
    total22
    
    Out[478]:
    133239118

    Getting the Average Net Profit Margin made in the Autumn by PG-rated Drama movies.

    In [479]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['NPM'][i]))
    total23 = sum(sum4)
    total23/13
    
    Out[479]:
    58.53846153846154

    Getting the Amount of Expenses spent in the Autumn to produce PG-rated Drama movies.

    In [480]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['Budget'][i]))
    total24 = sum(sum4)
    total24
    
    Out[480]:
    383600000

    Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by PG-rated Drama Movies, in a list for the Javascript graph below.

    In [486]:
    pg_total_tick = [total13//13, total16//9, total19//11, total22//13]
    print(pg_total_tick)
    
    [8398544, 11145717, 11981547, 10249162]
    

    Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by PG-rated Drama Movies, in a list for the Javascript graph below.

    In [485]:
    pg_total_pro = [total14//13, total17//9, total20//11, total23//13]
    print(pg_total_pro)
    
    [79, 65, 75, 58]
    

    Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by PG-rated Drama Movies, in a list for the Javascript graph below.

    In [487]:
    pg_total_bud = [total15//13, total18//9, total21//11, total24//13]
    print(pg_total_bud)
    
    [14009384, 22722222, 20227272, 29507692]
    

    Checking to see the different season each movie in the NC-17 rated category was realesed in. Based on the code below there are 8 movies that were realesed in 'Winter', 14 movies that were realesed in 'Autumn', 8 movies that were realesed in 'Spring' and 4 movies that were realesed in 'Summer'.

    In [977]:
    collections.Counter(season_nc17)
    
    Out[977]:
    Counter({1: 8, 2: 8, 4: 14, 3: 4})

    Getting the index of the NC-17 rated movies that were realesed in Winter, Summer, Autumn and Spring.

    In [493]:
    index_one = []
    index_two = []
    index_three = []
    index_four = []
    for x,i in enumerate(df_data.System):
        if i == 'NC-17':
            if df_data['Season'][x] ==1: index_one.append(x)
            if df_data['Season'][x] ==2: index_two.append(x)
            if df_data['Season'][x] ==3: index_three.append(x)
            if df_data['Season'][x] ==4: index_four.append(x)
    

    Getting the number of Tickets sold in the Winter by NC-17 rated Drama movies.

    In [494]:
    sum1 = []
    for i in index_one:
        sum1.append(int(df_data['No.Tickets'][i]))
    total25 = sum(sum1)
    total25
    
    Out[494]:
    16479358

    Getting the Average Net Profit Margin made in the Winter by NC-17 rated Drama movies.

    In [497]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['NPM'][i]))
    total26 = sum(sum1)
    total26/8
    
    Out[497]:
    57.875

    Getting the Amount of Expenses spent in the Winter to produce NC-17 rated Drama movies.

    In [498]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['Budget'][i]))
    total27 = sum(sum1)
    total27
    
    Out[498]:
    58250000

    Getting the number of Tickets sold in the Spring by NC-17 rated Drama movies.

    In [499]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['No.Tickets'][i]))
    total28 = sum(sum2)
    total28
    
    Out[499]:
    22453884

    Getting the Average Net Profit Margin made in the Spring by NC-17 rated Drama movies.

    In [500]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['NPM'][i]))
    total29 = sum(sum2)
    total29/8
    
    Out[500]:
    62.875

    Getting the Amount of Expenses spent in the Spring to produce NC-17 rated Drama movies.

    In [501]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['Budget'][i]))
    total30 = sum(sum2)
    total30
    
    Out[501]:
    33165116

    Getting the number of Tickets sold in the Summer by NC-17 rated Drama movies.

    In [502]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['No.Tickets'][i]))
    total31 = sum(sum3)
    total31
    
    Out[502]:
    8314920

    Getting the Average Net Profit Margin made in the Summer by NC-17 rated Drama movies.

    In [503]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['NPM'][i]))
    total32 = sum(sum3)
    total32/4
    
    Out[503]:
    71.25

    Getting the Amount of Expenses spent in the Summer to produce NC-17 rated Drama movies.

    In [504]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['Budget'][i]))
    total33 = sum(sum3)
    total33
    
    Out[504]:
    37404765

    Getting the number of Tickets sold in the Autumn by NC-17 rated Drama movies.

    In [505]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['No.Tickets'][i]))
    total34 = sum(sum4)
    total34
    
    Out[505]:
    48856406

    Getting the Average Net Profit Margin made in the Autumn by NC-17 rated Drama movies.

    In [506]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['NPM'][i]))
    total35 = sum(sum4)
    total35/14
    
    Out[506]:
    72.5

    Getting the Amount of Expenses spent in the Autumn to produce NC-17 rated Drama movies.

    In [507]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['Budget'][i]))
    total36 = sum(sum4)
    total36
    
    Out[507]:
    72404940

    Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by NC-17 rated Drama Movies, in a list for the Javascript graph below.

    In [508]:
    nc_total_tick = [total25//8, total28//8, total31//4, total34//14]
    print(nc_total_tick)
    
    [2059919, 2806735, 2078730, 3489743]
    

    Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by NC-17 rated Drama Movies, in a list for the Javascript graph below.

    In [511]:
    nc_total_pro = [total26/8, total29/8, total32/4, total35/14]
    print(nc_total_pro)
    
    [57.875, 62.875, 71.25, 72.5]
    

    Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by NC-17 rated Drama Movies, in a list for the Javascript graph below.

    In [510]:
    nc_total_bud = [total27//8, total30//8, total33//4, total36//14]
    print(nc_total_bud)
    
    [7281250, 4145639, 9351191, 5171781]
    

    Checking to see the different season each movie in the PG-13 rated category was realesed in. Based on the code below there are 20 movies that were realesed in 'Winter', 20 movies that were realesed in 'Autumn', 14 movies that were realesed in 'Spring' and 10 movies that were realesed in 'Summer'.

    In [992]:
    collections.Counter(season_pg13)
    
    Out[992]:
    Counter({4: 20, 1: 20, 3: 10, 2: 14})

    Getting the index of the PG-13 rated movies that were realesed in Winter, Summer, Autumn and Spring.

    In [513]:
    index_one = []
    index_two = []
    index_three = []
    index_four = []
    for x,i in enumerate(df_data.System):
        if i == 'PG-13':
            if df_data['Season'][x] ==1: index_one.append(x)
            if df_data['Season'][x] ==2: index_two.append(x)
            if df_data['Season'][x] ==3: index_three.append(x)
            if df_data['Season'][x] ==4: index_four.append(x)
    

    Getting the number of Tickets sold in the Winter by PG-13 rated Drama movies.

    In [514]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['No.Tickets'][i]))
    total37 = sum(sum1)
    total37
    
    Out[514]:
    237229054

    Getting the Average Net Profit Margin made in the Winter by PG-13 rated Drama movies.

    In [515]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['NPM'][i]))
    total38 = sum(sum1)
    total38/20
    
    Out[515]:
    68.45

    Getting the Amount of Expenses spent in the Winter to produce PG-13 rated Drama movies.

    In [516]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['Budget'][i]))
    total39 = sum(sum1)
    total39
    
    Out[516]:
    499100000

    Getting the number of Tickets sold in the Spring by PG-13 rated Drama movies.

    In [517]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['No.Tickets'][i]))
    total40 = sum(sum2)
    total40
    
    Out[517]:
    103122577

    Getting the Average Net Profit Margin made in the Spring by PG-13 rated Drama movies.

    In [518]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['NPM'][i]))
    total41 = sum(sum2)
    total41/14
    
    Out[518]:
    65.0

    Getting the Amount of Expenses spent in the Spring to produce PG-13 rated Drama movies.

    In [519]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['Budget'][i]))
    total42 = sum(sum2)
    total42
    
    Out[519]:
    271600000

    Getting the number of Tickets sold in the Summer by PG-13 rated Drama movies.

    In [520]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['No.Tickets'][i]))
    total43 = sum(sum3)
    total43
    
    Out[520]:
    101386726

    Getting the Average Net Profit Margin made in the Summer by PG-13 rated Drama movies.

    In [521]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['NPM'][i]))
    total44 = sum(sum3)
    total44/10
    
    Out[521]:
    72.3

    Getting the Amount of Expenses spent in the Summer to produce PG-13 rated Drama movies.

    In [522]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['Budget'][i]))
    total45 = sum(sum3)
    total45
    
    Out[522]:
    190175000

    Getting the number of Tickets sold in the Autumn by PG-13 rated Drama movies.

    In [523]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['No.Tickets'][i]))
    total46 = sum(sum4)
    total46
    
    Out[523]:
    226553982

    Getting the Average Net Profit Margin made in the Autumn by PG-13 rated Drama movies.

    In [524]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['NPM'][i]))
    total47 = sum(sum4)
    total47/20
    
    Out[524]:
    65.05

    Getting the Amount of Expenses spent in the Autumn to produce PG-13 rated Drama movies.

    In [525]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['Budget'][i]))
    total48 = sum(sum4)
    total48
    
    Out[525]:
    619650000

    Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by PG-13 rated Drama Movies, in a list for the Javascript graph below.

    In [526]:
    pg13_total_tick = [total37//20, total40//14, total43//10, total46//20]
    print(pg13_total_tick)
    
    [11861452, 7365898, 10138672, 11327699]
    

    Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by PG-13 rated Drama Movies, in a list for the Javascript graph below.

    In [527]:
    pg13_total_pro = [total38/20, total41/14, total44/10, total47/20]
    print(pg13_total_pro)
    
    [68.45, 65.0, 72.3, 65.05]
    

    Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by PG-13 rated Drama Movies, in a list for the Javascript graph below.

    In [528]:
    pg13_total_bud = [total39//20, total42//14, total45//10, total48//20]
    print(pg13_total_bud)
    
    [24955000, 19400000, 19017500, 30982500]
    

    Checking to see the different season each movie in the G rated category was realesed in. Based on the code below there are 3 movies that were realesed in 'Winter', 7 movies that were realesed in 'Autumn', 18 movies that were realesed in 'Spring' and 7 movies that were realesed in 'Summer'.

    In [1006]:
    collections.Counter(season_g)
    
    Out[1006]:
    Counter({2: 8, 4: 7, 3: 7, 1: 3})

    Getting the index of the G rated movies that were realesed in Winter, Summer, Autumn and Spring.

    In [548]:
    index_one = []
    index_two = []
    index_three = []
    index_four = []
    for x,i in enumerate(df_data.System):
        if i == 'G':
            if df_data['Season'][x] ==1: index_one.append(x)
            if df_data['Season'][x] ==2: index_two.append(x)
            if df_data['Season'][x] ==3: index_three.append(x)
            if df_data['Season'][x] ==4: index_four.append(x)
    

    Getting the number of Tickets sold in the Winter by G-rated Drama movies.

    In [549]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['No.Tickets'][i]))
    total49 = sum(sum1)
    total49
    
    Out[549]:
    16707096

    Getting the Average Net Profit Margin made in the Winter by G-rated Drama movies.

    In [550]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['NPM'][i]))
    total50 = sum(sum1)
    total50/3
    
    Out[550]:
    64.66666666666667

    Getting the Amount of Expenses spent in the Winter to produce G-rated Drama movies.

    In [551]:
    sum1 =[]
    for i in index_one:
        sum1.append(int(df_data['Budget'][i]))
    total51 = sum(sum1)
    total51
    
    Out[551]:
    77666000

    Getting the number of Tickets sold in the Spring by G-rated Drama movies.

    In [552]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['No.Tickets'][i]))
    total52 = sum(sum2)
    total52
    
    Out[552]:
    82454882

    Getting the Average Net Profit Margin made in the Spring by G-rated Drama movies.

    In [553]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['NPM'][i]))
    total53 = sum(sum2)
    total53/8
    
    Out[553]:
    75.25

    Getting the Amount of Expenses spent in the Spring to produce G-rated Drama movies.

    In [554]:
    sum2 =[]
    for i in index_two:
        sum2.append(int(df_data['Budget'][i]))
    total54 = sum(sum2)
    total54
    
    Out[554]:
    85100000

    Getting the number of Tickets sold in the Summer by G-rated Drama movies.

    In [555]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['No.Tickets'][i]))
    total55 = sum(sum3)
    total55
    
    Out[555]:
    193789041

    Getting the Average Net Profit Margin made in the Summer by G-rated Drama movies.

    In [556]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['NPM'][i]))
    total56 = sum(sum3)
    total56/7
    
    Out[556]:
    73.14285714285714

    Getting the Amount of Expenses spent in the Summer to produce G-rated Drama movies.

    In [557]:
    sum3 =[]
    for i in index_three:
        sum3.append(int(df_data['Budget'][i]))
    total57 = sum(sum3)
    total57
    
    Out[557]:
    193858000

    Getting the number of Tickets sold in the Autumn by G-rated Drama movies.

    In [558]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['No.Tickets'][i]))
    total58 = sum(sum4)
    total58
    
    Out[558]:
    74232412

    Getting the Average Net Profit Margin made in the Autumn by G-rated Drama movies.

    In [559]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['NPM'][i]))
    total59 = sum(sum4)
    total59/7
    
    Out[559]:
    74.71428571428571

    Getting the Amount of Expenses spent in the Autumn to produce G-rated Drama movies.

    In [560]:
    sum4 =[]
    for i in index_four:
        sum4.append(int(df_data['Budget'][i]))
    total60 = sum(sum4)
    total60
    
    Out[560]:
    135850000

    Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by G-rated Drama Movies, in a list for the Javascript graph below.

    In [561]:
    g_total_tick = [total49//3, total52//8, total55//7, total58//7]
    print(g_total_tick)
    
    [5569032, 10306860, 27684148, 10604630]
    

    Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by G-rated Drama Movies, in a list for the Javascript graph below.

    In [562]:
    g_total_pro = [total50/3, total53/8, total56/7, total59/7]
    print(g_total_pro)
    
    [64.66666666666667, 75.25, 73.14285714285714, 74.71428571428571]
    

    Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by G-rated Drama Movies, in a list for the Javascript graph below.

    In [563]:
    g_total_bud = [total51//3, total54//8, total57//7, total60//7]
    print(g_total_bud)
    
    [25888666, 10637500, 27694000, 19407142]
    

    This is the HTML Script from Highcharts Libaray that will be used to visualize the Average Number of Tickets Sold the Average Expenses and the Average Net Profit Margin within every Winter, Spring, Summer and Autumn. This will be on the Five Sysetem Rating; R, PG, PG-13, NC-17 and G rating of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, to see what season is the best time to realease movies per system rating and more. The visualisation that will be used is a 'Colunm Series Chart with positive and negative numbers'. This will be done using Javascript and HTML below.

    In [124]:
    %%html 
    <script src="https://code.highcharts.com/highcharts.js" ></script>
    <script src="https://cloud.highcharts.com/embed"></script>
    <script src="https://code.highcharts.com/modules/data.js" ></script>
    <script src="https://code.highcharts.com/modules/exporting.js" ></script>
    <script src="https://code.highcharts.com/modules/export-data.js" ></script>
    <script src="https://code.highcharts.com/modules/accessibility.js" ></script>
    <figure class="highcharts-figure">
     <table>
        <td><div id='v2' ></div><td>
        <td><div id='v3' ></div><td>
        <td><div id='v4' ></div><td>
        <td><div id='v5' ></div><td>
        <td><div id='v6' ></div><td>
        </table>
    </figure>
    
    In [125]:
    %%js inline
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    
    Highcharts.chart('v2', {
        chart: {
            height: 500,
            width: 500,
            type: 'bar',
            
        },
        title: {
            text: 'System Rating R'
        },
        subtitle: {
            text: 'Drama Movies'
        },
        xAxis: {
            categories: ['Winter', 'Spring', 'Summer', 'Autumn']
        },
        
        yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            },
            
        },
       
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
                        stacking: 'normal',
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
        legend: {
            enabled: true,
            verticalAlign: 'top',
            symbolRadius: 3,
            reversed: true
        },
        credits: {
            enabled: true
        },
        tooltip:{
            shared:true,
            formatter: function () {
                
    					var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
    					point;
    					for(var i = this.points.length; i >= 0; i--) {
    						point = this.points[i];
    						if (point) {
    							txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
    						}
                        }
                return txt;
    				}
        },
       series: [{
            name: 'Avg Net Profit Margin',
           //data: [73, 51, 77, 59],
            color: '#FFC300',
            data:[{
                name:'Avg Net Profit Margin',
                y:73,
                color: '#FFC300',
            },{
               name:'Avg Net Profit Margin',
                y:51,
                color: '#FFC300',
            },{
               name:'Avg Net Profit Margin',
                y:77,
                color: '#FFC300',
            },{
               name:'Avg Net Profit Margin',
                y:59,
                color: '#FFC300',
            }],
            tooltip: {
            valuePrefix: '%',
            color: '#581845'
          },
            stack: 'female'
        },{
            name: 'Avg Expenses',
            //data:[-25550000, -11421364, -8300000, -15651570],
            color: '#581845',
             data:[{
                //name:'System Rating: R',
                y:-25550000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-11421364,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-8300000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-15651570,
                color: '#581845',
            }],
            tooltip: {
            valuePrefix: '-$'
          },
            stack: 'male'
        }, {
            name: 'Avg No. Tickets Sold',
            //data: [17186733, 2732688, 3396913, 5210935],
            color: '#C70039',
           data:[{
                //name:'System Rating: R',
                y:17186733,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:2732688,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:3396913,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:5210935,
                color: '#C70039',
            }],
            tooltip: {
            valuePrefix: '$'
          },
            stack: 'male'
        }]
    });
    
    In [126]:
    <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
    
      Cell In[126], line 1
        <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
        ^
    SyntaxError: invalid syntax
    
    In [110]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    
    Highcharts.chart('v3', {
        chart: {
            height: 500,
            width: 500,
            type: 'bar',
        },
        title: {
            text: 'System Rating PG'
        },
        subtitle: {
            text: 'Drama Movies'
        },
        xAxis: {
            categories: ['Winter', 'Spring', 'Summer', 'Autumn']
        },
        
        yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            }
        },
       
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
                        stacking: 'normal',
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
        legend: {
            enabled: true,
            verticalAlign: 'top',
            symbolRadius: 3,
            reversed: true
        },
        credits: {
            enabled: true
        },
        tooltip:{
            shared:true,
            formatter: function () {
    					var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
    					point;
    					for(var i = this.points.length; i >= 0; i--) {
    						point = this.points[i];
    						if (point) {
    							txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
    						}
                        }
                return txt;
    				}
        },
       series: [{
            name: 'Avg Net Profit Margin',
            color: '#FFC300',
            //data: [80, 66, 75, 60],
            data:[{
                name:'Avg Net Profit Margin',
                y:80,
                color: '#FFC300',
            },{
               name:'Avg Net Profit Margin',
                y:66,
                color: '#FFC300',
            },{
               name:'Avg Net Profit Margin',
                y:75,
                color: '#FFC300',
            },{
               name:'Avg Net Profit Margin',
                y:60,
                color: '#FFC300',
            }],
            tooltip: {
            valuePrefix: '%'
          },
            stack: 'female'
        },{
            name: 'Avg Expenses',
            color: '#581845',
            //data: [-14009384, -22722222, -20227272, -29507692],
             data:[{
                //name:'System Rating: R',
                y:-14009384,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-22722222,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-20227272,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-29507692,
                color: '#581845',
            }],
            stack: 'male'
        }, {
            name: 'Avg No. Tickets Sold',
            color: '#C70039',
            //data: [8398544, 11145717, 11981547, 10249162],
           data:[{
                //name:'System Rating: R',
                y:8398544,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:11145717,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:11981547,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:10249162,
                color: '#C70039',
            }],
            stack: 'male'
        }]
    });
    
    In [111]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    
    Highcharts.chart('v4', {
        chart: {
            height: 500,
            width: 500,
            type: 'bar',
        },
        title: {
            text: 'System Rating NC-17'
        },
        subtitle: {
            text: 'Drama Movies'
        },
        xAxis: {
            categories: ['Winter', 'Spring', 'Summer', 'Autumn']
        },
        
        yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            }
        },
       
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
                        stacking: 'normal',
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
        legend: {
            enabled: true,
            verticalAlign: 'top',
            symbolRadius: 3,
            reversed: true
        },
        credits: {
            enabled: true
        },
        tooltip:{
            shared:true,
            formatter: function () {
    					var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
    					point;
    					for(var i = this.points.length; i >= 0; i--) {
    						point = this.points[i];
    						if (point) {
    							txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
    						}
                        }
                return txt;
    				}
        },
       series: [{
            name: 'Avg Net Profit Margin',
            color: '#FFC300',
            //data: [56, 63, 71, 73],
           data:[{
                //name:'System Rating: R',
                y:56,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:63,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:71,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:73,
                color: '#FFC300',
            }],
            tooltip: {
            valuePrefix: '%'
          },
            stack: 'female'
        },{
            name: 'Avg Expenses',
            color: '#581845',
            //data: [-24955000, -19400000, -19017500, -30982500],
            data:[{
                //name:'System Rating: R',
                y:-24955000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-19400000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-19017500,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-30982500,
                color: '#581845',
            }],
            stack: 'male'
        }, {
            name: 'Avg No. Tickets Sold',
            color: '#C70039',
            //data: [2059919, 2806735, 2078730, 3489743],
            data:[{
                //name:'System Rating: R',
                y:2059919,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:2806735,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:2078730,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:3489743,
                color: '#C70039',
            }],
            stack: 'male'
        }]
    });
    
    In [112]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    
    Highcharts.chart('v5', {
        chart: {
            height: 500,
            width: 500,
            type: 'bar',
        },
        title: {
            text: 'System Rating PG-13'
        },
        subtitle: {
            text: 'Drama Movies'
        },
        xAxis: {
            categories: ['Winter', 'Spring', 'Summer', 'Autumn'],
            colors: ['#2f7ed8', '#0d233a', '#8bbc21', '#910000']
        },
        
        yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            }
        },
       
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
                        stacking: 'normal',
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
        legend: {
            enabled: true,
            verticalAlign: 'top',
            symbolRadius: 3,
            reversed: true
        },
        credits: {
            enabled: true
        },
        tooltip:{
            shared:true,
            formatter: function () {
    					var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
    					point;
    					for(var i = this.points.length; i >= 0; i--) {
    						point = this.points[i];
    						if (point) {
    							txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
    						}
                        }
                return txt;
    				}
        },
       series: [{
            name: 'Avg Net Profit Margin',
            color: '#FFC300',
            //data: [69, 65, 72, 65],
            data:[{
                //name:'System Rating: R',
                y:69,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:65,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:72,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:65,
                color: '#FFC300',
            }],
            tooltip: {
            valuePrefix: '%'
          },
            stack: 'female'
        },{
            name: 'Avg Expenses',
            color: '#581845',
            //data: [-24955000, -19400000, -19017500, -30982500],
           data:[{
                //name:'System Rating: R',
                y:-24955000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-19400000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-19017500,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-30982500,
                color: '#581845',
            }],
            stack: 'male'
        }, {
            name: 'Avg No. Tickets Sold',
            color: '#C70039',
            //data: [11861452, 7365898, 10138672, 11327699],
             data:[{
                //name:'System Rating: R',
                y:11861452,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:7365898,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:10138672,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:11327699,
                color: '#C70039',
            }],
            stack: 'male'
        }]
    });
    
    In [113]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    
    Highcharts.chart('v6', {
        chart: {
            height: 500,
            width: 500,
            type: 'bar',
            colors: []
        },
        title: {
            text: 'System Rating G'
        },
        subtitle: {
            text: 'Drama Movies'
        },
        xAxis: {
            categories: ['Winter', 'Spring', 'Summer', 'Autumn']
        },
        
        yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            }
        },
       
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
                        stacking: 'normal',
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
        legend: {
            enabled: true,
            verticalAlign: 'top',
            symbolRadius: 3,
            reversed: true
        },
        credits: {
            enabled: true
        },
        tooltip:{
            shared:true,
            formatter: function () {
    					var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
    					point;
    					for(var i = this.points.length; i >= 0; i--) {
    						point = this.points[i];
    						if (point) {
    							txt += '<span style="color:'  + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
    						}
                        }
                return txt;
    				}
        },
       series: [{
            name: 'Avg Net Profit Margin',
            color: '#FFC300',
            //data: [65, 75, 73, 75] ,
           data:[{
                //name:'System Rating: R',
                y:65,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:75,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:73,
                color: '#FFC300',
            },{
               //name:'System Rating: R',
                y:75,
                color: '#FFC300',
            }],
            tooltip: {
            valuePrefix: '%'
          },
            stack: 'female'
        },{
            name: 'Avg Expenses',
            color: '#581845',
            //data: [-25888666, -10637500, -27694000, -19407142],
            data:[{
                //name:'System Rating: R',
                y:-25888666,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-10637500,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-27694000,
                color: '#581845',
            },{
               //name:'System Rating: R',
                y:-19407142,
                color: '#581845',
            }],
            stack: 'male'
        }, {
            name: 'Avg No. Tickets Sold',
            color: '#C70039',
            //data: [5569032, 10306860, 27684148, 10604630],
            data:[{
                //name:'System Rating: R',
                y:5569032,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:10306860,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:27684148,
                color: '#C70039',
            },{
               //name:'System Rating: R',
                y:10604630,
                color: '#C70039',
            }],
            stack: 'male'
        }]
    });
    

    This is the HTML Script from Highcharts Libaray to visualize the Total Number of Tickets Sold in each System Rating: 'R, PG, PG-13, NC-17 and G' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, this will be done using a 'Ring Chart and Pie Chart infused'. This will be done using Javascript and HTML below.

    In [56]:
    %%html
    <script type="text/javascript" src="js/script.js"></script>
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/series-label.js"></script>
    <script src="https://code.highcharts.com/modules/exporting.js"></script>
    <script src="https://code.highcharts.com/modules/export-data.js"></script>
    <script src="https://code.highcharts.com/modules/accessibility.js"></script>
    <div id='harph' style="height:700"></div>
    
    In [57]:
    %%js %%html inline
    var chart = new Highcharts.Chart({
        chart:{
            renderTo:'harph',
            height:593,
            width:1000
        },
        title:{
            text:'Number of Tickets Sold'
        },
        subtitle:{
            text:'Drama Genre'
        },
        series:[{
            type:'pie',
            name: 'Percentage of Tickets Sold',
            tooltip:{
                valueSuffix:'%'
            },
            shadow: true,
            data:[{
                name:'System Rating: R',
                y:21,
                sliced:true,
                selected:true,
                color: '#581845',
            },{
                name:'System Rating: PG-13',
                y:33,
                sliced:true,
                color:'#900C3F',
            },{
                name:'System Rating: PG',
                y:24,
                sliced:true,
                color:'#C70039',
            },{
                name:'System Rating: NC-17',
                y:5,
                sliced:true,
                color:'#FF5733',
            },{
                name:'System Rating: G',
                y:18,
                sliced:true,
                color:'#FFAA00',
            }],
            innerSize:'70%',
            size:'97%'
           },{
            type:'pie',
            shadow: true,
            dataLabels: false,
            name: 'No. of Tickets Sold',
            data:[{ 
               name: ' Fifty Shades of Grey ' ,
               y: 0.126950353271215 ,
               color:"#581845",
             },{ 
               name: ' Django Unchained ' ,
               y: 0.1000372823968936 ,
               color:"#581845",
             },{ 
               name: ' Fifty Shades Darker ' ,
               y: 0.08479655719100986 ,
               color:"#581845",
             },{ 
               name: ' Fifty Shades Freed ' ,
               y: 0.08256260817064842 ,
               color:"#581845",
             },{ 
               name: ' Gone Girl ' ,
               y: 0.08194376649358207 ,
               color:"#581845",
             },{ 
               name: ' Black Swan ' ,
               y: 0.07365072819242854 ,
               color:"#581845",
             },{ 
               name: ' Flight ' ,
               y: 0.035697055171768834 ,
               color:"#581845",
             },{ 
               name: ' The Wolfman ' ,
               y: 0.03171198361362964 ,
               color:"#581845",
             },{ 
               name: ' Zero Dark Thirty ' ,
               y: 0.02992846528333498 ,
               color:"#581845",
             },{ 
               name: ' Priest ' ,
               y: 0.018710016439102733 ,
               color:"#581845",
             },{ 
               name: ' The Ides of March ' ,
               y: 0.017283074157099485 ,
               color:"#581845",
             },{ 
               name: ' Manchester by the Sea ' ,
               y: 0.01728261837935836 ,
               color:"#581845",
             },{ 
               name: ' Fame ' ,
               y: 0.0171665551334068 ,
               color:"#581845",
             },{ 
               name: ' Crimson Peak ' ,
               y: 0.016667425147527084 ,
               color:"#581845",
             },{ 
               name: ' Hereditary ' ,
               y: 0.015592912448024023 ,
               color:"#581845",
             },{ 
               name: ' Boyhood ' ,
               y: 0.01273355188120584 ,
               color:"#581845",
             },{ 
               name: ' Quartet ' ,
               y: 0.012490297742501055 ,
               color:"#581845",
             },{ 
               name: ' Ordinary People ' ,
               y: 0.012176362481024666 ,
               color:"#581845",
             },{ 
               name: ' Downsizing ' ,
               y: 0.012108785093504838 ,
               color:"#581845",
             },{ 
               name: ' The Master ' ,
               y: 0.011260471551964184 ,
               color:"#581845",
             },{ 
               name: ' The Debt ' ,
               y: 0.010361506651894932 ,
               color:"#581845",
             },{ 
               name: ' Carol ' ,
               y: 0.009525425740264925 ,
               color:"#581845",
             },{ 
               name: ' The Witch ' ,
               y: 0.008994277923897528 ,
               color:"#581845",
             },{ 
               name: ' Whiplash ' ,
               y: 0.0086640102561464 ,
               color:"#581845",
             },{ 
               name: ' Ex Machina ' ,
               y: 0.008528244071941818 ,
               color:"#581845",
             },{ 
               name: ' For Colored Girls ' ,
               y: 0.008452536054181487 ,
               color:"#581845",
             },{ 
               name: ' Room ' ,
               y: 0.008062325831900929 ,
               color:"#581845",
             },{ 
               name: ' Arbitrage ' ,
               y: 0.00796626344721367 ,
               color:"#581845",
             },{ 
               name: ' Endless Love ' ,
               y: 0.007718911755450848 ,
               color:"#581845",
             },{ 
               name: ' Nocturnal Animals ' ,
               y: 0.007203218139466748 ,
               color:"#581845",
             },{ 
               name: ' The Water Diviner ' ,
               y: 0.006904416922301841 ,
               color:"#581845",
             },{ 
               name: ' Let Me In ' ,
               y: 0.006285375147690608 ,
               color:"#581845",
             },{ 
               name: ' Biutiful ' ,
               y: 0.005488791268114878 ,
               color:"#581845",
             },{ 
               name: ' Before Midnight ' ,
               y: 0.005169615674268552 ,
               color:"#581845",
             },{ 
               name: ' Melancholia ' ,
               y: 0.004850653517803438 ,
               color:"#581845",
             },{ 
               name: ' Buried ' ,
               y: 0.00472903645332829 ,
               color:"#581845",
             },{ 
               name: ' Margin Call ' ,
               y: 0.004542932396748761 ,
               color:"#581845",
             },{ 
               name: ' If Beale Street Could Talk ' ,
               y: 0.004415301289396786 ,
               color:"#581845",
             },{ 
               name: ' Mommy ' ,
               y: 0.0038987894967847116 ,
               color:"#581845",
             },{ 
               name: ' Addicted ' ,
               y: 0.00389061662372918 ,
               color:"#581845",
             },{ 
               name: ' Silent House ' ,
               y: 0.0036930803274185455 ,
               color:"#581845",
             },{ 
               name: ' Blue Valentine ' ,
               y: 0.0036831821688648927 ,
               color:"#581845",
             },{ 
               name: ' Winter\'s Bone ' ,
               y: 0.0035865372779914128 ,
               color:"#581845",
             },{ 
               name: ' Unsane ' ,
               y: 0.0031670839111789142 ,
               color:"#581845",
             },{ 
               name: ' Rich and Famous ' ,
               y: 0.0028902978705634837 ,
               color:"#581845",
             },{ 
               name: ' Stoker ' ,
               y: 0.0026757288265710135 ,
               color:"#581845",
             },{ 
               name: ' Chloe ' ,
               y: 0.0026304222957969038 ,
               color:"#581845",
             },{ 
               name: ' The Florida Project ' ,
               y: 0.0025112953341025483 ,
               color:"#581845",
             },{ 
               name: ' Never Let Me Go ' ,
               y: 0.0024842599324825083 ,
               color:"#581845",
             },{ 
               name: ' Raggedy Man ' ,
               y: 0.002445636659707563 ,
               color:"#581845",
             },{ 
               name: ' We Need to Talk About Kevin ' ,
               y: 0.0023934512200015122 ,
               color:"#581845",
             },{ 
               name: ' We Are Your Friends ' ,
               y: 0.0022574160157643607 ,
               color:"#581845",
             },{ 
               name: ' Pennies from Heaven ' ,
               y: 0.002039058458255398 ,
               color:"#581845",
             },{ 
               name: ' The Homesman ' ,
               y: 0.0018270173132466435 ,
               color:"#581845",
             },{ 
               name: ' Miss Sloane ' ,
               y: 0.0017163100115798451 ,
               color:"#581845",
             },{ 
               name: ' The Immigrant ' ,
               y: 0.001686379865477133 ,
               color:"#581845",
             },{ 
               name: ' Tulip Fever ' ,
               y: 0.0015102406666328858 ,
               color:"#581845",
             },{ 
               name: ' Knock Knock ' ,
               y: 0.0014070236830629552 ,
               color:"#581845",
             },{ 
               name: ' Martha Marcy May Marlene ' ,
               y: 0.0012092361531681874 ,
               color:"#581845",
             },{ 
               name: ' Take Shelter ' ,
               y: 0.0011054322167999271 ,
               color:"#581845",
             },{ 
               name: ' Stone ' ,
               y: 0.000903778357676767 ,
               color:"#581845",
             },{ 
               name: ' By the Sea ' ,
               y: 0.0008287929143840789 ,
               color:"#581845",
             },{ 
               name: ' Zoot Suit ' ,
               y: 0.000723926237721873 ,
               color:"#581845",
             },{ 
               name: ' Everything Must Go ' ,
               y: 0.0006271968612183303 ,
               color:"#581845",
             },{ 
               name: ' A Ghost Story ' ,
               y: 0.0006158068643022558 ,
               color:"#581845",
             },{ 
               name: ' The Hand ' ,
               y: 0.000544171943233367 ,
               color:"#581845",
             },{ 
               name: ' Coriolanus ' ,
               y: 0.0004845962342028908 ,
               color:"#581845",
             },{ 
               name: ' Locke ' ,
               y: 0.000464313013069698 ,
               color:"#581845",
             },{ 
               name: ' Ghost Story ' ,
               y: 0.0004339181960016415 ,
               color:"#581845",
             },{ 
               name: ' Palo Alto ' ,
               y: 0.00025708310236240476 ,
               color:"#581845",
             },{ 
               name: ' I Origins ' ,
               y: 0.00018951460806679334 ,
               color:"#581845",
             },{ 
               name: ' Stake Land ' ,
               y: 0.00015106919977619045 ,
               color:"#581845",
             },{ 
               name: ' One from the Heart ' ,
               y: 0.0001415801295365251 ,
               color:"#581845",
             },{ 
               name: ' The Reluctant Fundamentalist ' ,
               y: 0.00011755286100792544 ,
               color:"#581845",
             },{ 
               name: ' Sound of My Voice ' ,
               y: 9.547987850103754e-05 ,
               color:"#581845",
             },{ 
               name: ' Hesher ' ,
               y: 8.514150534863738e-05 ,
               color:"#581845",
             },{ 
               name: ' The Canyons ' ,
               y: 1.3868983166596162e-05 ,
               color:"#581845",
            },
                
                
                { 
               name: ' Gravity ' ,
               y: 0.10042430006814071 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Sing ' ,
               y: 0.09184777334318545 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' A Quiet Place ' ,
               y: 0.04842760737950036 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' True Grit ' ,
               y: 0.03652123784321127 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Creed II ' ,
               y: 0.030920888022590957 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Help ' ,
               y: 0.0308526277418438 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Me Before You ' ,
               y: 0.03014981553669598 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Arrival ' ,
               y: 0.029406105359216384 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Vow ' ,
               y: 0.02860848125707642 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Post ' ,
               y: 0.026021608866616704 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Creed ' ,
               y: 0.025126763953606857 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Impossible ' ,
               y: 0.024551032089162032 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Step Up Revolution ' ,
               y: 0.023966418802457628 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Bridge of Spies ' ,
               y: 0.023524309274997962 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Lights Out ' ,
               y: 0.02154219152868317 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Dear John ' ,
               y: 0.020561688301883676 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Contagion ' ,
               y: 0.019912856613967363 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Woman in Black ' ,
               y: 0.01866848901001518 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Water for Elephants ' ,
               y: 0.01691012954105202 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Hereafter ' ,
               y: 0.0157303625217635 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Rite ' ,
               y: 0.014063191445323746 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Lucky One ' ,
               y: 0.013989337388600871 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Safe Haven ' ,
               y: 0.013615422996981813 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Burlesque ' ,
               y: 0.01310899083645976 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Collateral Beauty ' ,
               y: 0.012349894879717762 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Rings ' ,
               y: 0.012003641015419623 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Ouija: Origin of Evil ' ,
               y: 0.011846510053157636 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' If I Stay ' ,
               y: 0.011343345271615188 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Book Thief ' ,
               y: 0.011014803583575563 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Anna Karenina ' ,
               y: 0.010279088857626476 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Age of Adaline ' ,
               y: 0.009986647581467405 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Giver ' ,
               y: 0.009632789135078054 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Fences ' ,
               y: 0.009306004911850674 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Longest Ride ' ,
               y: 0.009236524249854158 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Brooklyn ' ,
               y: 0.008986542976119461 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Tree of Life ' ,
               y: 0.008935250772031564 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Everything, Everything ' ,
               y: 0.008918068441012986 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' One Day ' ,
               y: 0.008565641734903134 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Remember Me ' ,
               y: 0.008180190904166454 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Roommate ' ,
               y: 0.007606856372282654 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Charlie St. Cloud ' ,
               y: 0.0070179999806649915 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Trouble with the Curve ' ,
               y: 0.006922574273886692 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Still Alice ' ,
               y: 0.006036704881334775 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Dream House ' ,
               y: 0.006028389495929878 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Best of Me ' ,
               y: 0.005944026841948274 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Beastly ' ,
               y: 0.005505212199095423 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Gifted ' ,
               y: 0.005351242936297972 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Amour ' ,
               y: 0.005325529517850589 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Courageous ' ,
               y: 0.005093735254359923 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Suffragette ' ,
               y: 0.004928561067635958 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Perks of Being a Wallflower ' ,
               y: 0.0047873254625720495 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Project Almanac ' ,
               y: 0.004764183096436486 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Mud ' ,
               y: 0.004568389355969665 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Bye Bye Man ' ,
               y: 0.0045149372363135045 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Victor Frankenstein ' ,
               y: 0.004505764833471335 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Draft Day ' ,
               y: 0.00432091398964024 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Upside Down ' ,
               y: 0.0038199583442621154 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Wish Upon ' ,
               y: 0.0033987313785130402 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Light Between Oceans ' ,
               y: 0.003225647152469375 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Black or White ' ,
               y: 0.0031806667659938295 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Country Strong ' ,
               y: 0.002982477140630577 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Before I Fall ' ,
               y: 0.0027426990069261224 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Space Between Us ' ,
               y: 0.002385953917344334 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Words ' ,
               y: 0.0023697849515387473 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Paranoia ' ,
               y: 0.0023655954102153195 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Anonymous ' ,
               y: 0.0022895553799615618 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Ida ' ,
               y: 0.002214689405690285 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Labor Day ' ,
               y: 0.0020542085475670634 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Midnight Special ' ,
               y: 0.001111842596726238 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Rabbit Hole ' ,
               y: 0.0008982802210818929 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Mustang ' ,
               y: 0.0008038273449080661 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' The Beaver ' ,
               y: 0.0007304973427667674 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Like Crazy ' ,
               y: 0.0005397472657314678 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Another Earth ' ,
               y: 0.0003044120146536895 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Anna ' ,
               y: 0.00017371975079867003 , 
               sliced:true,
               color:"#900C3F",
             },{ 
               name: ' Maggie ' ,
               y: 0.00014878517590070093 , 
               sliced:true,
               color:"#900C3F",
                
            }
    
                  ,{ 
               name: ' Tex ' ,
               y: 0.10992102523245781 , 
               color:"#C70039",
             },{ 
               name: ' Cinderella ' ,
               y: 0.10851702629705251 , 
               color:"#C70039",
             },{ 
               name: ' Wonder ' ,
               y: 0.06121391899642231 , 
               color:"#C70039",
             },{ 
               name: ' Wonder ' ,
               y: 0.06094720207717018 , 
               color:"#C70039",
             },{ 
               name: ' Little Women ' ,
               y: 0.043338915265064594 , 
               color:"#C70039",
             },{ 
               name: ' Hugo ' ,
               y: 0.036025077981249466 , 
               color:"#C70039",
             },{ 
               name: ' Contact ' ,
               y: 0.03423881834270405 , 
               color:"#C70039",
             },{ 
               name: ' Resurrection ' ,
               y: 0.031473064673483604 , 
               color:"#C70039",
             },{ 
               name: ' Phenomenon ' ,
               y: 0.03042038310878855 , 
               color:"#C70039",
             },{ 
               name: ' Bridge to Terabithia ' ,
               y: 0.027529273427924796 , 
               color:"#C70039",
             },{ 
               name: ' Sense and Sensibility ' ,
               y: 0.0269281584279092 , 
               color:"#C70039",
             },{ 
               name: ' Forever Young ' ,
               y: 0.02560226914729842 , 
               color:"#C70039",
             },{ 
               name: ' Rocky III ' ,
               y: 0.025021318835561402 , 
               color:"#C70039",
             },{ 
               name: ' On Golden Pond ' ,
               y: 0.023867369638086482 , 
               color:"#C70039",
             },{ 
               name: ' The Lake House ' ,
               y: 0.022975921543411725 , 
               color:"#C70039",
             },{ 
               name: ' Mr. Holland\'s Opus ' ,
               y: 0.021263155570788162 , 
               color:"#C70039",
             },{ 
               name: ' Dolphin Tale ' ,
               y: 0.019222026117505144 , 
               color:"#C70039",
             },{ 
               name: ' The Last Song ' ,
               y: 0.018543779884263614 , 
               color:"#C70039",
             },{ 
               name: ' The Last Song ' ,
               y: 0.01783509453584228 , 
               color:"#C70039",
             },{ 
               name: ' Footloose ' ,
               y: 0.01600868559832901 , 
               color:"#C70039",
             },{ 
               name: ' War Room ' ,
               y: 0.014803758436182365 , 
               color:"#C70039",
             },{ 
               name: ' War Room ' ,
               y: 0.01480142543096974 , 
               color:"#C70039",
             },{ 
               name: ' Staying Alive ' ,
               y: 0.012984128419475586 , 
               color:"#C70039",
             },{ 
               name: ' God\'s Not Dead ' ,
               y: 0.012939149039390006 , 
               color:"#C70039",
             },{ 
               name: ' August Rush ' ,
               y: 0.012926721684865472 , 
               color:"#C70039",
             },{ 
               name: ' The Remains of the Day ' ,
               y: 0.012796507580034979 , 
               color:"#C70039",
             },{ 
               name: ' The Natural ' ,
               y: 0.009604138096565115 , 
               color:"#C70039",
             },{ 
               name: ' A Walk to Remember ' ,
               y: 0.009503078553444008 , 
               color:"#C70039",
             },{ 
               name: ' Urban Cowboy ' ,
               y: 0.00938770284197271 , 
               color:"#C70039",
             },{ 
               name: ' We Are Marshall ' ,
               y: 0.00871282606051339 , 
               color:"#C70039",
             },{ 
               name: ' A River Runs Through It ' ,
               y: 0.008691803002392428 , 
               color:"#C70039",
             },{ 
               name: ' Absence of Malice ' ,
               y: 0.00814690222317329 , 
               color:"#C70039",
             },{ 
               name: ' Dreamer ' ,
               y: 0.007751685937913325 , 
               color:"#C70039",
             },{ 
               name: ' Overcomer ' ,
               y: 0.0076238828719174916 , 
               color:"#C70039",
             },{ 
               name: ' The Majestic ' ,
               y: 0.007464482191583959 , 
               color:"#C70039",
             },{ 
               name: ' Taps ' ,
               y: 0.007174301162444658 , 
               color:"#C70039",
             },{ 
               name: ' The Indian in the Cupboard ' ,
               y: 0.007134299927272464 , 
               color:"#C70039",
             },{ 
               name: ' Fireproof ' ,
               y: 0.006697545744744855 , 
               color:"#C70039",
             },{ 
               name: ' The Age of Innocence ' ,
               y: 0.006453868752613964 , 
               color:"#C70039",
             },{ 
               name: ' The Jazz Singer ' ,
               y: 0.0054259378521386 , 
               color:"#C70039",
             },{ 
               name: ' Tuck Everlasting ' ,
               y: 0.0038705917063661553 , 
               color:"#C70039",
             },{ 
               name: ' Akeelah and the Bee ' ,
               y: 0.003791317549827424 , 
               color:"#C70039",
             },{ 
               name: ' Honeysuckle Rose ' ,
               y: 0.0035645778554022458 , 
               color:"#C70039",
             },{ 
               name: ' Extraordinary Measures ' ,
               y: 0.0031667604494077946 , 
               color:"#C70039",
             },{ 
               name: ' Pure Country ' ,
               y: 0.0030341993333299544 , 
               color:"#C70039",
             },{ 
               name: ' The Night the Lights Went Out in Georgia ' ,
               y: 0.002986036581637784 , 
               color:"#C70039",
             },{ 
               name: ' Ragtime ' ,
               y: 0.002985442325593059 , 
               color:"#C70039",
             },{ 
               name: ' Music of the Heart ' ,
               y: 0.002973159033139973 , 
               color:"#C70039",
             },{ 
               name: ' The Spanish Prisoner ' ,
               y: 0.002768218731331894 , 
               color:"#C70039",
             },{ 
               name: ' The Lunchbox ' ,
               y: 0.002447354481836171 , 
               color:"#C70039",
             },{ 
               name: ' Gettysburg ' ,
               y: 0.002154920481968384 , 
               color:"#C70039",
             },{ 
               name: ' Somewhere in Time ' ,
               y: 0.0019427570679668466 , 
               color:"#C70039",
             },{ 
               name: ' What If... ' ,
               y: 0.0017059930544033786 , 
               color:"#C70039",
             },{ 
               name: ' Tender Mercies ' ,
               y: 0.001689351884288976 , 
               color:"#C70039",
             },{ 
               name: ' Three Wishes ' ,
               y: 0.0014057056707795462 , 
               color:"#C70039",
             },{ 
               name: ' Six Weeks ' ,
               y: 0.0013341788523053774 , 
               color:"#C70039",
             },{ 
               name: ' The Secret of Roan Inish ' ,
               y: 0.0012208900400079781 , 
               color:"#C70039",
             },{ 
               name: ' Eddie and the Cruisers ' ,
               y: 0.0009577706708178526 , 
               color:"#C70039",
             },{ 
               name: ' Fluke ' ,
               y: 0.0007978977870279055 , 
               color:"#C70039",
             },{ 
               name: ' The Ultimate Gift ' ,
               y: 0.0006880444549621317 , 
               color:"#C70039",
             },{ 
               name: ' Looker ' ,
               y: 0.000656528875970674 , 
               color:"#C70039",
             },{ 
               name: ' Newsies ' ,
               y: 0.000564139068343821 , 
               color:"#C70039",
             },{ 
               name: ' Table for Five ' ,
               y: 0.0004802069048282557 , 
               color:"#C70039",
             },{ 
               name: ' Testament ' ,
               y: 0.0004091542906726049 , 
               color:"#C70039",
             },{ 
               name: ' Man, Woman and Child ' ,
               y: 0.0003413290670898207 , 
               color:"#C70039",
             },{ 
               name: ' Cattle Annie and Little Britches ' ,
               y: 0.00010701010701676988 , 
               color:"#C70039",
             },{ 
               name: ' Five Days One Summer ' ,
               y: 3.983316275550381e-05 , 
               color:"#C70039",
            }
            
                ,{ 
               name: ' Hell ' ,
               y: 0.00520639197113044 , 
               color:"#FF5733",
             },{ 
               name: ' Crash ' ,
               y: 0.09741626549901868 , 
               color:"#FF5733",
             },{ 
               name: ' Crash ' ,
               y: 0.09475588094154686 , 
               color:"#FF5733",
             },{ 
               name: ' Lust, Caution ' ,
               y: 0.06460065143400773 , 
               color:"#FF5733",
             },{ 
               name: ' Se, jie ' ,
               y: 0.06274762192347601 , 
               color:"#FF5733",
             },{ 
               name: ' Lust, Caution  ' ,
               y: 0.06274762192347601 , 
               color:"#FF5733",
             },{ 
               name: ' Natural Born Killers ' ,
               y: 0.0484164223116735 , 
               color:"#FF5733",
             },{ 
               name: ' Showgirls ' ,
               y: 0.036348982740728945 , 
               color:"#FF5733",
             },{ 
               name: ' Last Tango in Paris ' ,
               y: 0.034805467094213366 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Shame ' ,
               y: 0.019654866958915027 , 
               color:"#FF5733",
             },{ 
               name: ' Kids ' ,
               y: 0.019654269980860305 , 
               color:"#FF5733",
             },{ 
               name: ' Kids ' ,
               y: 0.019654269980860305 , 
               color:"#FF5733",
             },{ 
               name: ' Showgirls ' ,
               y: 0.01959508249533823 , 
               color:"#FF5733",
             },{ 
               name: ' Blue Is the Warmest Colour ' ,
               y: 0.018743031123720486 , 
               color:"#FF5733",
             },{ 
               name: ' Blue Is the Warmest Colour ' ,
               y: 0.018743031123720486 , 
               color:"#FF5733",
             },{ 
               name: ' Matador ' ,
               y: 0.016711794035176298 , 
               color:"#FF5733",
             },{ 
               name: ' Blue Valentine ' ,
               y: 0.015951099563287444 , 
               color:"#FF5733",
             },{ 
               name: ' The Dreamers ' ,
               y: 0.01473872379225418 , 
               color:"#FF5733",
             },{ 
               name: ' The Dreamers ' ,
               y: 0.014559678519229444 , 
               color:"#FF5733",
             },{ 
               name: ' Beyond the Valley of the Dolls ' ,
               y: 0.00866581047175382 , 
               color:"#FF5733",
             },{ 
               name: ' Happiness 1998 ' ,
               y: 0.005533071842823304 , 
               color:"#FF5733",
             },{ 
               name: ' Killer Joe ' ,
               y: 0.0044861071363392156 , 
               color:"#FF5733",
             },{ 
               name: ' Clerks ' ,
               y: 0.003749638419058066 , 
               color:"#FF5733",
             },{ 
               name: ' Elles ' ,
               y: 0.0036803119352840355 , 
               color:"#FF5733",
             },{ 
               name: ' Arabian Nights ' ,
               y: 0.003325187022151564 , 
               color:"#FF5733",
             },{ 
               name: ' Frontier(s) ' ,
               y: 0.0026801811200606253 , 
               color:"#FF5733",
             },{ 
               name: ' The Evil Dead ' ,
               y: 0.0025630963919089293 , 
               color:"#FF5733",
             },{ 
               name: ' Young Adam ' ,
               y: 0.002466694064749819 , 
               color:"#FF5733",
             },{ 
               name: ' Two Girls and a Guy ' ,
               y: 0.002229067912936027 , 
               color:"#FF5733",
             },{ 
               name: ' Nymphomaniac: Vol. I ' ,
               y: 0.002016534096777114 , 
               color:"#FF5733",
             },{ 
               name: ' Bad Lieutenant ' ,
               y: 0.001963210476340922 , 
               color:"#FF5733",
             },{ 
               name: ' A Dirty Shame ' ,
               y: 0.0018430927145241121 , 
               color:"#FF5733",
             },{ 
               name: ' Wide Sargasso Sea ' ,
               y: 0.0015548197148420703 , 
               color:"#FF5733",
             },{ 
               name: ' Law of Desire ' ,
               y: 0.001416195633328915 , 
               color:"#FF5733",
             },{ 
               name: ' Queen of Hearts ' ,
               y: 0.0011909134470982216 , 
               color:"#FF5733",
             },{ 
               name: ' Ma mère ' ,
               y: 0.0009841953526336853 , 
               color:"#FF5733",
             },{ 
               name: ' Ma Mère ' ,
               y: 0.0009841953526336853 , 
               color:"#FF5733",
             },{ 
               name: ' Whore ' ,
               y: 0.0009709559199685058 , 
               color:"#FF5733",
             },{ 
               name: ' Whore 1991 ' ,
               y: 0.0009709559199685058 , 
               color:"#FF5733",
             },{ 
               name: ' The Big Feast ' ,
               y: 0.0006652164978467291 , 
               color:"#FF5733",
             },{ 
               name: ' Orgazmo ' ,
               y: 0.0006039973612029393 , 
               color:"#FF5733",
             },{ 
               name: ' Bent ' ,
               y: 0.00047764021584646666 , 
               color:"#FF5733",
             },{ 
               name: ' Pink Flamingos ' ,
               y: 0.00039843470813463673 , 
               color:"#FF5733",
             },{ 
               name: ' Tokyo Decadence ' ,
               y: 0.0002675231979413424 , 
               color:"#FF5733",
             },{ 
               name: ' Man Bites Dog ' ,
               y: 0.0001979367398531592 , 
               color:"#FF5733",
             },{ 
               name: ' Chained ' ,
               y: 0.006204461478904e-05 , 
               color:"#FF5733",
                }
    
                   ,{ 
               name: ' The Lion King 1994 ' ,
               y: 0.26147885261744513 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Beauty and the Beast 1991 ' ,
               y: 0.11630273554483538 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Hunchback of Notre Drame ' ,
               y: 0.0863010375487224 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Secret Garden ' ,
               y: 0.08253110067343736 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Sound of Music ' ,
               y: 0.07588504584079123 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Bambi 1942 ' ,
               y: 0.07105584658389433 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Babe ' ,
               y: 0.06524941732946415 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Charlotte\'s Web ' ,
               y: 0.03817547208967575 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Tale of Despereaux ' ,
               y: 0.02398991734505535 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Rookie ' ,
               y: 0.021394581337878135 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Rookie ' ,
               y: 0.021341019016509186 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' My Fair Lady 1964 ' ,
               y: 0.019108624607797248 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Babe: Pig in the City ' ,
               y: 0.018329189694847987 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Little Rascals ' ,
               y: 0.017750161433978465 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Ten Commandments 1966 ' ,
               y: 0.017366261012108503 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Hachiko: A Dog\'s Story ' ,
               y: 0.012648847449378404 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Black Stallion ' ,
               y: 0.010021960525247894 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Giant ' ,
               y: 0.008005557330788077 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Ramona and Beezus ' ,
               y: 0.007283123524021923 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Prancer ' ,
               y: 0.004928078239825991 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Kit Kittredge: An American Girl ' ,
               y: 0.004681723907847047 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Three Cions in the Fountain ' ,
               y: 0.0031816050709206414 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' A Little Princess ' ,
               y: 0.002655433875629345 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' A Little Princess ' ,
               y: 0.002655433875629345 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Secret Garden ' ,
               y: 0.002312295117392995 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' The Quiet Man ' ,
               y: 0.002015117295743652 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Lassie Come Home ' ,
               y: 0.0011976091754457114 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Pollyanna ' ,
               y: 0.0009942515846627004 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' A Sunday in the Country ' ,
               y: 0.0006392746042249663 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Little Dorrit ' ,
               y: 0.0002718230805716641 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Miracle of Marcelino ' ,
               y: 0.0001571871985288343 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' La traviata ' ,
               y: 5.183099794285635e-05 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Before the Wrath ' ,
               y: 2.8899579394195828e-05 , 
               sliced:true,
               color:"#FFAA00",
             },{ 
               name: ' Through the Olive Trees ' ,
               y: 1.0684890363175155e-05 , 
               sliced:true,
               color:"#FFAA00",}],
            innerSize:'50%',
            size:'57%'
        }]
    });
    
    In [21]:
    %%js
    function dollarFormat(x) {
        return '$' + Highcharts.numberFormat(x, 0, '.', ',');
    }
    
    var colors = Highcharts.getOptions().colors;
    
    Highcharts.chart('container9', {
        chart: {
            type: 'column',
            inverted: false,
            height: 450,
            width: 1100,
            
        },
    
        accessibility: {
            series: {
                descriptionFormatter: function (series) {
                    return series.type === 'line' ?
                        series.name + ', ' + dollarFormat(series.points[0].y) :
                        series.name + ' grant amounts, bar series with ' +
                        series.points.length + ' bars.';
                }
            },
            point: {
                valuePrefix: '$'
            },
            keyboardNavigation: {
                seriesNavigation: {
                    mode: 'serialize'
                }
            }
        },
    
        title: {
            text: 'Total Net Profit of each System Rating in the Drama Genere',
            margin: 35
        },
    
        subtitle: {
            text: 'There are five System Ratings: R-rated| G-rated| PG-rated| PG-13 rated| NC-17 rated '
        },
    
        xAxis: {
            visible: false,
            accessibility: {
                description: 'Grant applicants',
                rangeDescription: ''
            }
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
    
        yAxis: [{
            min: 0,
            max: 900000000,
            step: 250000000,
            labels: {
                format: '${text}'
            },
            title: {
                text: 'Movies Profit'
            },
            gridLineWidth: 1
        }, {
            accessibility: {
                description: 'System Ratigs Category Totals'
            },
            opposite: true,
            min: 0,
            max: 7000000000,
            step: 1000000000,
            gridLineWidth: 0,
            labels: {
                format: '${text}',
                style: {
                    color: '#8F6666'
                }
            },
            title: {
                text: 'System Ratigs Category Total',
                style: {
                    color: '#8F6666'
                }
            }
        }],
    
        credits: {
            enabled: false
        },
    
        plotOptions: {
            column: {
                keys: ['name', 'y'],
                grouping: false,
                pointPadding: 0.1,
                groupPadding: 0,
                tooltip: {
                    headerFormat: '<span style="font-size: 10px">' +
                        '<span style="color:{point.color}">\u25CF</span> ' +
                        '{series.name}</span><br/>',
                    pointFormat: '{point.name}: <b>${point.y:,.0f}</b><br/>'
                }
            },
            line: {
                yAxis: 1,
                lineWidth: 5,
                accessibility: {
                    exposeAsGroupOnly: true
                },
                marker: {
                    enabled: false
                },
                enableMouseTracking: false,
                linkedTo: ':previous',
                dataLabels: {
                    enabled: true,
                    verticalAlign: 'bottom',
                    style: {
                        color: '#757575',
                        fontWeight: 'normal'
                    },
                    formatter: function () {
                        if (this.point === this.series.points[Math.floor(
                            this.series.points.length / 2
                        )]) {
                            return 'Total: $' + Highcharts.numberFormat(this.y, 0);
                        }
                    }
                }
            }
        },
    
        responsive: {
            rules: [{
                condition: {
                    maxWidth: 400
                },
                chartOptions: {
                    chart: {
                        spacingLeft: 3,
                        spacingRight: 5
                    },
                    yAxis: [{}, {
                        visible: false
                    }]
                }
            }]
        },
    
        series: [{
            name: 'System Rating R',
            color: '#ff0000',
            borderColor: '#A59273',
            borderWidth: 1,
            data: [
                [ ' Django Unchained ' , 349948323 ],
                [ ' Gone Girl ' , 307567189 ],
                [ ' Priest ' , 24154026 ],
                [ ' Fifty Shades Darker ' , 326398492 ],
                [ ' Fifty Shades Freed ' , 316350619 ],
                [ ' Crimson Peak ' , 19966854 ],
                [ ' Zero Dark Thirty ' , 82112435 ],
                [ ' The Master ' , 13147416 ],
                [ ' Flight ' , 129558438 ],
                [ ' The Ides of March ' , 54735925 ],
                [ ' Nocturnal Animals ' , 9898681 ],
                [ ' The Water Diviner ' , 8554727 ],
                [ ' For Colored Girls ' , 17017873 ],
                [ ' The Debt ' , 26604054 ],
                [ ' Let Me In ' , 8270399 ],
                [ ' Black Swan ' , 318266710 ],
                [ ' Ex Machina ' , 25358392 ],
                [ ' Room ' , 23262783 ],
                [ ' If Beale Street Could Talk ' , 7859167 ],
                [ ' Arbitrage ' , 23830713 ],
                [ ' Stoker ' , 34913 ],
                [ ' Carol ' , 31043521 ],
                [ ' Quartet ' , 45178935 ],
                [ ' Hereditary ' , 60133905 ],
                [ ' Melancholia ' , 12417298 ],
                [ ' Manchester by the Sea ' , 69233867 ],
                [ ' We Need to Talk About Kevin ' , 3765283 ],
                [ ' Addicted ' , 12499242 ],
                [ ' Mommy ' , 12636004 ],
                [ ' Take Shelter ' , 222016 ],
                [ ' Boyhood ' , 53273049 ],
                [ ' The Witch ' , 36954520 ],
                [ ' Margin Call ' , 17033227 ],
                [ ' Whiplash ' , 35669037 ],
                [ ' Before Midnight ' , 20251930 ],
                [ ' Silent House ' , 14610760 ],
                [ ' Winter\'s Bone ' , 14131551 ],
                [ ' The Florida Project ' , 9295324 ],
                [ ' We Are Your Friends ' , 8153415 ],
                [ ' Locke ' , 88390 ],
                [ ' Knock Knock ' , 4328516 ],
                [ ' Buried ' , 19282640 ],
                [ ' Unsane ' , 12744931 ],
                [ ' Blue Valentine ' , 15566240 ],
                [ ' Martha Marcy May Marlene ' , 4438911 ],
                [ ' Palo Alto ' , 156309 ],
                [ ' Sound of My Voice ' , 294448 ],
                [ ' A Ghost Story ' , 2669782 ],
                [ ' Ordinary People ' , 48766923 ],
                [ ' Fame ' , 68711836 ],
                [ ' Endless Love ' , 14718173 ],
                [ ' Ghost Story ' , 1851683 ],
                [ ' Zoot Suit ' , 556082 ],
                [ ' Rich and Famous ' , 1500000 ],
                [ ' Raggedy Man ' , 2000000 ],
               ]
        }, {
            type: 'line',
            name: 'System Rating R',
            data: [
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 
                3278073978, 3278073978, 3278073978
                 
            ],
            color: '#ff1919'
        }, {
            name: 'System Rating NC-17',
            color: '#d61111',
            data: [
                [ ' Shame ' , 13912841 ],
                [ ' Matador ' , 4856268 ],
                [ ' Whore ' , 8404 ],
                [ ' Tokyo Decadence ' , 257845 ],
                [ ' Wide Sargasso Sea ' , 659312 ],
                [ ' Kids ' , 18912216 ],
                [ ' Crash ' , 89410061 ],
                [ ' The Dreamers ' , 121165 ],
                [ ' Lust, Caution ' , 52091915 ],
                [ ' Shame ' , 13912841 ],
                [ ' Blue Is the Warmest Colour ' , 15465835 ],
                [ ' The Dreamers ' , 307113 ],
                [ ' Shame ' , 13912841 ],
                [ ' Blue Is the Warmest Colour ' , 15390895 ],
                [ ' Blue Valentine ' , 15566240 ],
                [ ' Two Girls and a Guy ' , 1315026 ],
                [ ' Elles ' , 256669 ],
                [ ' Se, jie ' , 50167430 ],
                [ ' The Evil Dead ' , 2311944 ],
                [ ' Shame ' , 13912841 ],
                [ ' Arabian Nights ' , 2548651 ],
                [ ' Natural Born Killers ' , 16283563 ],
                [ ' Clerks ' , 3664240 ],
                [ ' Bad Lieutenant ' , 1038916 ],
                [ ' Beyond the Valley of the Dolls ' , 8000000 ],
                [ ' Kids ' , 18912216 ],
                [ ' Crash ' , 94673038 ],
                [ ' Last Tango in Paris ' , 34897711 ],
                [ ' Pink Flamingos ' , 401802 ],
                [ ' Lust, Caution  ' , 50167430 ],
                [ ' Happiness 1998 ' , 3546453 ],
                [ ' Whore 1991 ' , 958404 ],
                [ ' Law of Desire ' , 858737 ],
            ],
            pointStart: 59
        }, {
            type: 'line',
            name: 'System Rating NC-17',
            data: [
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867,  
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 
                759820867, 759820867, 759820867, 759820867
            ],
            pointStart: 59,
            color: '#d61111'
        }, {
            name: 'System Rating PG',
            color: '#a10505',
            data: [
                [ ' Hugo ' , 47784 ],
                [ ' Dolphin Tale ' , 59068724 ],
                [ ' Wonder ' , 284604712 ],
                [ ' The Last Song ' , 72678948 ],
                [ ' War Room ' , 70975239 ],
                [ ' The Lunchbox ' , 10531500 ],
                [ ' Somewhere in Time ' , 4609597 ],
                [ ' Urban Cowboy ' , 36918287 ],
                [ ' Cinderella ' , 447351353 ],
                [ ' War Room ' , 70986904 ],
                [ ' Wonder ' , 285937718 ],
                [ ' Little Women ' , 176601214 ],
                [ ' Overcomer ' , 33102988 ],
                [ ' The Jazz Singer ' , 26696000 ],
                [ ' A Walk to Remember ' , 35694916 ],
                [ ' Tuck Everlasting ' , 4344615 ],
                [ ' Dreamer ' , 6741732 ],
                [ ' The Lake House ' , 74830111 ],
                [ ' Akeelah and the Bee ' , 10948425 ],
                [ ' Bridge to Terabithia ' , 120587063 ],
                [ ' August Rush ' , 34605762 ],
                [ ' Fireproof ' , 32973297 ],
                [ ' The Last Song ' , 69137047 ],
                [ ' God\'s Not Dead ' , 62667874 ],
                [ ' Mr. Holland\'s Opus ' , 83269971 ],
                [ ' Phenomenon ' , 120036382 ],
                [ ' Contact ' , 81120329 ],
                [ ' The Spanish Prisoner ' , 3835130 ],
                [ ' Sense and Sensibility ' , 118582776 ],
                [ ' The Secret of Roan Inish ' , 3101815 ],
                [ ' The Remains of the Day ' , 48954968 ],
                [ ' Pure Country ' , 5164458 ],
                [ ' Forever Young ' , 107956187 ],
                [ ' A River Runs Through It ' , 31440294 ],
                [ ' Honeysuckle Rose ' , 12815212 ],
                [ ' Resurrection ' , 150297525 ],
                [ ' Taps ' , 21856053 ],
                [ ' On Golden Pond ' , 104285432 ],
                [ ' Absence of Malice ' , 28716963 ],
                [ ' The Night the Lights Went Out in Georgia ' , 7423752 ],
                [ ' Rocky III ' , 108052686 ],
                [ ' Tex ' , 544368315 ],
                [ ' Staying Alive ' , 42892670 ],
                [ ' Tender Mercies ' , 3943124 ],
                [ ' Footloose ' , 71808942 ],
                [ ' The Natural ' , 20000000 ],
    
    
            ],
            pointStart: 96
        }, {
            type: 'line',
            name: 'System Rating PG',
            data: [
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,  
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,  
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 
                3752564794, 3752564794, 3752564794, 3752564794,
            ],
            pointStart: 96,
            color: '#a10505'
        }, {
            name: 'System Rating PG\-13',
            color: '#7a2f2f',
            data: [
                [ ' Gravity ' , 583698673 ],
                [ ' Sing ' , 559454789 ],
                [ ' Contagion ' , 77551594 ],
                [ ' Burlesque ' , 35552675 ],
                [ ' Creed II ' , 163591522 ],
                [ ' The Post ' , 129748880 ],
                [ ' Hereafter ' , 58660270 ],
                [ ' Anna Karenina ' , 22004627 ],
                [ ' Arrival ' , 156127894 ],
                [ ' Charlie St. Cloud ' , 4478084 ],
                [ ' Bridge of Spies ' , 122498338 ],
                [ ' The Impossible ' , 129590606 ],
                [ ' Water for Elephants ' , 78809717 ],
                [ ' Creed ' , 136567581 ],
                [ ' The Rite ' , 60143987 ],
                [ ' Collateral Beauty ' , 49309093 ],
                [ ' True Grit ' , 217276928 ],
                [ ' The Tree of Life ' , 26721826 ],
                [ ' The Longest Ride ' , 29802928 ],
                [ ' Step Up Revolution ' , 132552290 ],
                [ ' The Vow ' , 167618160 ],
                [ ' The Age of Adaline ' , 38984536 ],
                [ ' Safe Haven ' , 66050951 ],
                [ ' The Best of Me ' , 15059418 ],
                [ ' The Help ' , 188120004 ],
                [ ' Dear John ' , 117033509 ],
                [ ' The Lucky One ' , 71633833 ],
                [ ' The Giver ' , 41540205 ],
                [ ' Draft Day ' , 4847480 ],
                [ ' Rings ' , 57917283 ],
                [ ' Fences ' , 40282881 ],
                [ ' Me Before You ' , 188265198 ],
                [ ' The Light Between Oceans ' , 2281732 ],
                [ ' The Book Thief ' , 57086711 ],
                [ ' A Quiet Place ' , 317522294 ],
                [ ' Beastly ' , 21028230 ],
                [ ' The Roommate ' , 36545707 ],
                [ ' Remember Me ' , 40506120 ],
                [ ' The Woman in Black ' , 113955898 ],
                [ ' Country Strong ' , 5601987 ],
                [ ' One Day ' , 44168692 ],
                [ ' Suffragette ' , 20044909 ],
                [ ' The Perks of Being a Wallflower ' , 20069303 ],
                [ ' Project Almanac ' , 20909437 ],
                [ ' Wish Upon ' , 11477345 ],
                [ ' If I Stay ' , 67356170 ],
                [ ' Brooklyn ' , 51076141 ],
                [ ' Everything, Everything ' , 51603136 ],
                [ ' Mud ' , 21556959 ],
                [ ' Amour ' , 27087044 ],
                [ ' Ouija: Origin of Evil ' , 72831866 ],
                [ ' Black or White ' , 12971021 ],
                [ ' The Bye Bye Man ' , 23787727 ],
                [ ' Gifted ' , 29964656 ],
                [ ' The Words ' , 10369708 ],
                [ ' Lights Out ' , 143806510 ],
                [ ' Still Alice ' , 36699612 ],
                [ ' Before I Fall ' , 13945682 ],
                [ ' Rabbit Hole ' , 1205034 ],
                [ ' Ida ' , 12698355 ],
                [ ' Courageous ' , 33185884 ],
                [ ' Mustang ' , 4152584 ],
                [ ' Like Crazy ' , 3478400 ],
                [ ' Another Earth ' , 1927779 ]
            ],
            pointStart: 150
        }, {
            type: 'line',
            name: 'System Rating PG\-13',
            data: [
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,  
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,  
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 
                5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 
                5102398393, 5102398393, 5102398393, 5102398393,
            ],
            pointStart: 150,
            color: '#7a2f2f',
        },{
            name: 'System Rating G',
            color: '#4d0909',
            borderWidth: 1,
            data: [
               
                [ ' A Sunday in the Country ' , 1711143 ],
                [ ' Prancer ' , 11587135 ],
                [ ' The Rookie ' , 58693537 ],
                [ ' Beauty and the Beast 1991 ' , 418656843 ],
                [ ' The Little Rascals ' , 43947950 ],
                [ ' Ramona and Beezus ' , 12469621 ],
                [ ' The Black Stallion ' , 35099643 ],
                [ ' The Hunchback of Notre Drame ' , 255500000 ],
                [ ' Babe ' , 216100000 ],
                [ ' Pollyanna ' , 1250000 ],
                [ ' Lassie Come Home ' , 3851000 ],
                [ ' Charlotte\'s Web ' , 58985708 ],
                [ ' Kit Kittredge: An American Girl ' , 7657973 ],
                [ ' The Rookie ' , 58491516 ],
                [ ' The Secret Garden ' , 293281000 ],
                [ ' The Sound of Music ' , 278014195 ],
                [ ' The Tale of Despereaux ' , 30482317 ],
                [ ' Bambi 1942 ' , 267142000 ],
                [ ' My Fair Lady 1964 ' , 55071636 ],
                [ ' Hachiko: A Dog\'s Story ' , 37707417 ],
                [ ' Giant ' , 23794409 ],
                [ ' The Ten Commandments 1966 ' , 52500000 ],
                [ ' The Quiet Man ' , 5850377 ],
                [ ' Three Cions in the Fountain ' , 10300000 ],
    
            ],
            pointStart:216
        }, {
            type: 'line',
            name: 'System Rating G',
            data: [
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 
                3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288
            ],
            pointStart: 216,
            color: '#4d0909'
        }]
    });
    
    In [27]:
    %%js
    Highcharts.chart('x',{
        chart: {
            width: 900,
            height: 350
        },
        title:{
            text:"What Movie Is The Most Successful1?"
        },
        xAxis:{
            categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest', 
                        'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest', 
                        'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
            crosshair:{
                enabled:true
            },
            labels:{
                enabled:false
            }
        },
        yAxis:{
            min:0,
            max:1000000000,
            step:250000000,
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        plotOptions:{
           series:{
               marker:{
                   states:{
                       hover:{
                           radiusPlus:12,
                           lineWidthPlus:5
                       }
                   }
               }
           } 
        },
        tooltip:{
            shared:false
        },
        states:{
            hover:{
                lineWidthPlus:10
            }
        },
        series:[{
            type:'column',
            color:'#C21602',
            name:'Profit',
            data:[941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 
                  418656843.0, 349948323.0, 326398492.0, 318266710.0]
        },{
            type:'column',
            color:'#F88379',
            name:'Revenue',
            data:[986214868, 693698673, 634454789, 549368315, 570998101, 542351353, 
                  438656843, 449948323, 381398492, 331266710]
        },{
            type:'spline',
            color:'gold',
            name:'Cost',
            data:[45000000.0, 110000000.0, 75000000.0, 5000000.0, 40000000.0, 
                  95000000.0, 20000000.0, 100000000.0, 55000000.0, 13000000.0],
            marker:{
                lineWidth: 2,
                lineColor: 'gold',
                fillColor: 'white',
                raduis:2
            }
       }]
    });
    
    In [32]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    Highcharts.chart('n',{
            chart: {
            width: 900,
            height: 300
        },
            title:{
                text:""
            },
            xAxis:{
               categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest', 
                        'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest', 
                        'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
               crosshair:{
                   enabled:true
               },
               labels:{
                   enabled:true
               } 
            },
            yAxis: {
            type: 'logarithmic',
            custom: {
                allowNegativeLog: true
            },
            
      },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                            valueSuffix:'%',
                                    }
                    },
    				series: {
    					dataLabels: {
    						enabled: true,
                            valueSuffix:'%',
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
           tooltip:{
               valueSuffix:'%',
               shared:true
           },
           series:[{
               type:'column',
               color:'#F57070',
               name:'Net Profit Margin',
               data:[95, 84, 88, 99, 93, 82, 95, 78, 86, 96]
           },{
               type:'column',
               color:'#EC0303',
               name:'Return On Investment Percentage',
               data:[2092, 531, 746, 10887, 1327, 471, 2093, 350, 593, 2448]
           }]  
        
       });
    
    In [33]:
    %%js
    Highcharts.chart('no',{
        chart: {
            width: 900,
            height: 350
        },
        title:{
            text:"What Movie Is The Most Successful?"
        },
        xAxis:{
            categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest', 
                        'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest', 
                        'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
            crosshair:{
                enabled:true
            },
            labels:{
                enabled:false
            }
        },
        yAxis:{
            min:0,
            max:400000000,
            step:250000000,
        },
        legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
        plotOptions:{
           series:{
               marker:{
                   states:{
                       hover:{
                           radiusPlus:12,
                           lineWidthPlus:5
                       }
                   }
               }
           } 
        },
        tooltip:{
            shared:false
        },
        states:{
            hover:{
                lineWidthPlus:10
            }
        },
        series:[{
            type:'column',
            color:'#C21602',
            name:'Profit',
            data:[317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
        },{
            type:'column',
            color:'#F88379',
            name:'Revenue',
            data:[334522294, 371350619, 368567189, 311281000, 305937718, 304604712, 286214195, 268000000, 325500000, 252276928]
        },{
            type:'spline',
            color:'gold',
            name:'Cost',
            data:[17000000.0, 55000000.0, 61000000.0, 18000000.0, 20000000.0, 20000000.0, 8200000.0, 858000.0, 70000000.0, 35000000.0],
            marker:{
                lineWidth: 2,
                lineColor: 'gold',
                fillColor: 'white',
                raduis:2
            }
       }]
    });
    
    In [30]:
    %%js
    (function (H) {
        H.addEvent(H.Axis, 'afterInit', function () {
            const logarithmic = this.logarithmic;
    
            if (logarithmic && this.options.custom.allowNegativeLog) {
    
                // Avoid errors on negative numbers on a log axis
                this.positiveValuesOnly = false;
    
                // Override the converter functions
                logarithmic.log2lin = num => {
                    const isNegative = num < 0;
    
                    let adjustedNum = Math.abs(num);
    
                    if (adjustedNum < 10) {
                        adjustedNum += (10 - adjustedNum) / 10;
                    }
    
                    const result = Math.log(adjustedNum) / Math.LN10;
                    return isNegative ? -result : result;
                };
    
                logarithmic.lin2log = num => {
                    const isNegative = num < 0;
    
                    let result = Math.pow(10, Math.abs(num));
                    if (result < 10) {
                        result = (10 * (result - 1)) / (10 - 1);
                    }
                    return isNegative ? -result : result;
                };
            }
        });
    }(Highcharts));
    Highcharts.chart('xo',{
            chart: {
            width: 900,
            height: 310
        },
            title:{
                text:""
            },
            xAxis:{
               categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest', 
                        'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest', 
                        'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
               crosshair:{
                   enabled:true
               },
               labels:{
                   enabled:true
               } 
            },
           yAxis:{
            type: 'logarithmic',
           },
          legend: {
            enabled: true,
            verticalAlign: 'bottom',
            symbolRadius: 20,
            reversed: true
        },
           plotOptions: {
             bar: {
                        dataLabels: {
                            enabled: true,
                                    }
                    },
    				series: {
    					dataLabels: {
    						enabled: true,
                            
                    style: {
                        textOutline: false ,
                        fontWeight: 'bold'
                    }
                        
                            
    					}
    				}
    				
        },
           tooltip:{
               valueSuffix:'%',
               shared:true
           },
           series:[{
               type:'column',
               color:'#F57070',
               name:'Net Profit Margin',
               data:[95, 85, 83, 94, 93, 93, 97, 100, 78, 86]
           },{
               type:'column',
               color:'#EC0303',
               name:'Return On Investment Percentage',
               data:[1868, 575, 504, 1629, 1430, 1423, 3390, 31135, 365, 621]
           }]  
        
       });
    
    In [ ]: